Data Analysis Insight: 05/01/25

Introduction

The task under consideration is for classification of airline customer data to predict satisfaction. The data consists of a range of attributes including continuous variables such as departure delay and flight distance, and ordinal data rated 0 or 1 to 5 from passenger surveys. There is also some categorical data such as gender and whether the customer has a loyalty card. Unlike in other classification problems such as fraud detection or loan default prediction, satisfaction is split with a balance of 44% ‘satisfied’ and 56% ‘neutral or unsatisfied’. A few of the attributes show a significant amount of missing values – nearly 30% in some cases. Here, values were imputed using the mean or median, or dealt with in other ways. Two machine learning techniques are commonly found in the literature as being appropriate for the task of classification: naïve Bayes and random forest and these are the subject of this comparison work. Naïve Bayes is a simple probabilistic algorithm that uses Bayes theorem and assumes feature independence. Random forest algorithms are collections of decision trees each trained on a random subset of the data where the final classification is based on the majority one. This is also termed an ‘ensemble’ technique. Both are simple to implement and robust in that they are not regarded as prone to overfitting. Despite their simplicity, both have been reported as “surprisingly accurate” in use.

Results

Feature selection was undertaken by successively removing data using the column filter, one column at a time. Table 1 shows the change in scores as features were removed, with negative changes (more than -0.002 to allow for rounding differences) highlighted in pink. Positive differences (more than +0.002) are highlighted in green. If the score is reduced on feature removal it means that the model is negatively affected – the feature contributes to the model. Conversely if the score increases the model gets better without the feature.

With the random forest, most of the features make a contribution. For naïve Bayes, there are a number of features whose removal improves the model.

Table 1: Feature selection - effect on model scores
	Random Forest		Naïve Bayes
Filtered Out	F-score	AUC	F-score	AUC
None	0.892	0.968	0.778	0.881
Gender	0.887	0.967	0.786	0.882
Customer loyalty	0.871	0.958	0.785	0.878
Age	0.889	0.968	0.784	0.882
Type of Travel	0.859	0.955	0.756	0.866
Class	0.886	0.967	0.771	0.875
Online check-in	0.888	0.964	0.780	0.876
Flight Distance	0.890	0.967	0.775	0.882
Departure/Arrival time convenient	0.889	0.966	0.784	0.882
Ease of Online booking	0.873	0.948	0.779	0.875
Gate location	0.873	0.948	0.785	0.882
Food and drink	0.886	0.967	0.787	0.884
Seat comfort	0.882	0.964	0.777	0.883
Inflight entertainment	0.890	0.967	0.786	0.888
On-board service	0.888	0.967	0.783	0.882
Leg room service	0.889	0.966	0.788	0.880
Baggage handling	0.890	0.965	0.781	0.884
Checkin service	0.884	0.964	0.792	0.879
Inflight service	0.886	0.966	0.784	0.883
Cleanliness	0.885	0.964	0.782	0.883
Departure delay in minutes	0.890	0.968	0.795	0.882
Arrival delay in minutes	0.892	0.967	0.793	0.882

Using these set-ups, the random forest model showed a slight decline in both F-score and AUC. The naïve Bayes model was improved - see table 2

Table 2: Results summary before and after tuning and feature selection
	Random Forest		Naïve Bayes
	F-score	AUC	F-score	AUC
Base model	0.892	0.968	0.778	0.881
Tuned parameters and features	0.886	0.966	0.789	0.887

Conclusions

Overall, the model performance was at a high level, with general ‘accuracy’ in the 80-90% range and recall 80-85%. This proved hard to improve on whether through tuning parameters in the random forest model or selecting features to include. Conversely, this makes them simple to set up and run. Kelleher et al (2015) declare that naïve Bayes models are often used “to define a baseline accuracy score” because they are so easy to implement.

Predicting customer satisfaction was not the aim here. What the model might do is allow the airline to analyse which features are important contributors to satisfaction, which means studying ‘accuracy’ in terms of the positive outcomes – hence why evaluation measures such as recall, F-1 and AUC were chosen. This worked well with the naïve Bayes model, where we saw a clear outcome from feature selection with some attributes contributing positively to the model and some negatively. For the random forest algorithm this was not so successful as all attributes made a mild positive contribution. In terms of interpretability a single decision tree may have been preferable, although that would have brought other problems such as a tendency to overfitting and probably lower accuracy. Binary logistic regression is another method that might have been useful as the outputs are explicit in showing attribute contribution. Another option would be to use several models together. Khan et al (2024) review the literature on classification problems (albeit for class-imbalance problems) and conclude ensemble methods generally show better performance.

References

Kelleher, J., Mac Namee, B., & D’Arcy, A. (2015) Fundamentals of machine learning for predictive data analytics. MIT Press.

Khan, A., Chaudhari, O., & Chandra, R. (2024) A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. (2024). Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2023.122778

Data Analysis Insight

Thursday, May 1, 2025

Classification of Airline Customer Data

Evaluating Embeddings for NLP and Document Clustering

Report Abuse