Supporting data for "Measuring Airport Service Quality Using Machine Learning Algorithms"
The airport industry is a highly competitive market that has expanded quickly during the last two decades. Airport management usually measures the level of passenger satisfaction by applying the traditional methods, such as user surveys and expert opinions, which require time and effort to analyse. Recently, there has been considerable attention on employing machine learning techniques and sentiment analysis for measuring the level of passenger satisfaction. Sentiment analysis can be implemented using a range of different methods. However, it is still uncertain which techniques are better suited for recognising the sentiment for a particular subject domain or dataset. In this paper, we analyse the sentiment of air travellers using five different algorithms, namely Logistic Regression, XGBoost, Support Vector Machine, Random Forest and Naïve Bayes. We obtain our data set through the SKYTRAX website which is a collection of reviews of around 600 airports. We apply some pre-processing steps, such as converting the textual reviews into numerical form, by using the term frequency-inverse document frequency. We also remove stopwords from the text using the NLTK list of stopwords. We evaluate our results using the accuracy, precision, recall and F1_score performance metrics. Our analysis shows that XGBoost provides the most accurate results when compared with other algorithms.