Introduction
The task under consideration is for classification of airline
customer data to predict satisfaction. The data consists of a range of
attributes including continuous variables such as departure delay and flight
distance, and ordinal data rated 0 or 1 to 5 from passenger surveys. There is
also some categorical data such as gender and whether the customer has a
loyalty card. Unlike in other
classification problems such as fraud detection or loan default prediction, satisfaction
is split with a balance of 44% ‘satisfied’ and 56% ‘neutral or unsatisfied’. A
few of the attributes show a significant amount of missing values – nearly 30%
in some cases. Here, values were imputed using the mean or median, or dealt
with in other ways. Two machine learning techniques are commonly found in the
literature as being appropriate for the task of classification: naïve Bayes and
random forest and these are the subject of this comparison work. Naïve Bayes is
a simple probabilistic algorithm that uses Bayes theorem and assumes feature
independence. Random forest algorithms are collections of decision trees each
trained on a random subset of the data where the final classification is based
on the majority one. This is also termed an ‘ensemble’ technique. Both are
simple to implement and robust in that they are not regarded as prone to
overfitting. Despite their simplicity, both have been reported as “surprisingly
accurate” in use.
Results
Feature selection was undertaken by successively removing data using the column filter, one column at a time. Table 1 shows the change in scores as features were removed, with negative changes (more than -0.002 to allow for rounding differences) highlighted in pink. Positive differences (more than +0.002) are highlighted in green. If the score is reduced on feature removal it means that the model is negatively affected – the feature contributes to the model. Conversely if the score increases the model gets better without the feature.
With the random
forest, most of the features make a contribution. For naïve Bayes, there are a
number of features whose removal improves the model.
Table 1: Feature selection - effect on model scores |
||||
|
Random
Forest |
Naïve
Bayes |
||
Filtered Out |
F-score |
AUC |
F-score |
AUC |
None |
0.892 |
0.968 |
0.778 |
0.881 |
Gender |
0.887 |
0.967 |
0.786 |
0.882 |
Customer loyalty |
0.871 |
0.958 |
0.785 |
0.878 |
Age |
0.889 |
0.968 |
0.784 |
0.882 |
Type of Travel |
0.859 |
0.955 |
0.756 |
0.866 |
Class |
0.886 |
0.967 |
0.771 |
0.875 |
Online check-in |
0.888 |
0.964 |
0.780 |
0.876 |
Flight Distance |
0.890 |
0.967 |
0.775 |
0.882 |
Departure/Arrival time convenient |
0.889 |
0.966 |
0.784 |
0.882 |
Ease of Online booking |
0.873 |
0.948 |
0.779 |
0.875 |
Gate location |
0.873 |
0.948 |
0.785 |
0.882 |
Food and drink |
0.886 |
0.967 |
0.787 |
0.884 |
Seat comfort |
0.882 |
0.964 |
0.777 |
0.883 |
Inflight entertainment |
0.890 |
0.967 |
0.786 |
0.888 |
On-board service |
0.888 |
0.967 |
0.783 |
0.882 |
Leg room service |
0.889 |
0.966 |
0.788 |
0.880 |
Baggage handling |
0.890 |
0.965 |
0.781 |
0.884 |
Checkin service |
0.884 |
0.964 |
0.792 |
0.879 |
Inflight service |
0.886 |
0.966 |
0.784 |
0.883 |
Cleanliness |
0.885 |
0.964 |
0.782 |
0.883 |
Departure delay in minutes |
0.890 |
0.968 |
0.795 |
0.882 |
Arrival delay in minutes |
0.892 |
0.967 |
0.793 |
0.882 |
Using these set-ups, the random forest model showed a slight decline in both F-score and AUC. The naïve Bayes model was improved - see table 2
Table 2: Results summary before and after
tuning and feature selection |
||||
|
Random
Forest |
Naïve
Bayes |
||
|
F-score |
AUC |
F-score |
AUC |
Base model |
0.892 |
0.968 |
0.778 |
0.881 |
Tuned parameters and features |
0.886 |
0.966 |
0.789 |
0.887 |
Conclusions
Overall, the model performance was at a high level, with
general ‘accuracy’ in the 80-90% range and recall 80-85%. This proved hard to
improve on whether through tuning parameters in the random forest model or
selecting features to include. Conversely, this makes them simple to set up and
run. Kelleher et al (2015) declare that naïve Bayes models are often used “to
define a baseline accuracy score” because they are so easy to implement.
Predicting customer satisfaction was not the aim here. What
the model might do is allow the airline to analyse which features are important
contributors to satisfaction, which means studying ‘accuracy’ in terms of the
positive outcomes – hence why evaluation measures such as recall, F-1 and AUC
were chosen. This worked well with the naïve Bayes model, where we saw a clear
outcome from feature selection with some attributes contributing positively to
the model and some negatively. For the random forest algorithm this was not so
successful as all attributes made a mild positive contribution. In terms of
interpretability a single decision tree may have been preferable, although that
would have brought other problems such as a tendency to overfitting and
probably lower accuracy. Binary logistic regression is another method that
might have been useful as the outputs are explicit in showing attribute
contribution. Another option would be to use several models together. Khan et
al (2024) review the literature on classification problems (albeit for
class-imbalance problems) and conclude ensemble methods generally show better
performance.
References
Kelleher, J., Mac Namee, B., & D’Arcy, A. (2015)
Fundamentals of machine learning for predictive data analytics. MIT Press.
No comments:
Post a Comment