Customer satisfaction prediction with Michigan-style learning classifier system
Case Study
Customer satisfaction prediction with Michigan‑style learning
classifier system
Keivan Borna1
· Shokoofeh Hoseini2
· Mohammad Ali Mehdi Aghaei3
© The Author(s) 2019 OPEN
Abstract
Many different classification algorithms can be use in order to analyze, classify and predict data. Learning classifier system (LCS) which is known as a genetic base machine learning system, combines the machine learning with evolutionary computing and other heuristics to produce an adaptive system that learns to solve a particular problem. This paper
uses the Michigan style LCS, in the context of bank customer satisfaction to classify customers into two different groups:
unsatisfied/satisfied customers. Three different Rule Compaction strategies are used to compare the rule population’s
accuracy and micro/macro population size. The result specifies features that mostly influence prediction.
Keywords Learning classifier system · Evolutionary algorithm · Michigan style LCS · Machine learning · Prediction
1 Introduction
2 Why using LCS?
[Learning] Classifier Systems (LCSs) [1, 2] are a kind of
Rule-Based system (RBS) [3, 4] with general mechanism
for parallel rule processing, adaptive generation of new
rules, and testing the effectiveness of existing rules. These
mechanisms approach to more reliable without “brittleness” learning systems in AI. For a further understanding
of what is the LCS see [1, 5, 6]. This paper indicates the
reason of using LCS as a Genetic Base Machine Learning
(GBML) [7, 8] system for prediction. A preprocessing step is
required to prepare dataset. Experimental results are conducted by applying three Rule Compaction algorithms [9,
10] on a dataset which consists of customer’s satisfaction
information in Santander Bank [11]. Section 2 indicates
the eagerness of using LCS. The proposed method is presented in Sect. 3 and the concept of Rule Compaction and
their algorithm is presented in Sect. 4, experimental results
and evaluation are discussed in Sect. 5, and finally Sect. 6
is devoted to the conclusions.
LCS algorithms in general, constitute a unique alternative to other well-known machine learning strategies that
follow the classic paradigm of seeking to identify a ‘best’
model that can individually be applied to the entire dataset. There are a lot of LCS implementation [12] that causes
prediction/classification. Here are the advantages that
encourage us to use LCS [13, 17].
• Model free: They make limited assumptions about the
environment, or the patterns of association within the
data [17].
• Ensemble Learner: is to build a predictive learning systems by integrating multiple learner to improve the
performance and accuracy. Majority Voting and averaging are two of the applicable ensemble methods [17].
• Stochastic Learner are Non-deterministic learning with
advantage in large-scale or high complexity in compare
with deterministic.
* Keivan Borna, ; Shokoofeh Hoseini, ; Mohammad Ali Mehdi Aghaei,
| 1Department of Computer Science, Faculty of Mathematics and Computer Science, Kharazmi University,
Tehran, Iran. 2Department of IT Management, Faculty of Management, Kharazmi University, Tehran, Iran. 3Department of IT Management,
Research and Science Branch Islamic Azad University, Tehran, Iran.
SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1
Received: 25 May 2019 / Accepted: 12 October 2019 / Published online: 21 October 2019
Vol.:(0123456789)
Case Study
SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1
• Implicitly Multi-objective: is a characteristics of obtain-
Figure 1 shows the proposed method phases. Starting
from preprocessing the raw dataset, then applying three
rule compaction strategies separately on the processed
dataset. After obtaining the predicted results, a comprehensive evaluation is investigated and presented in Sect. 5,
while the subsection 3.1 discusses the dataset used, subsection 3.2 presents the preprocessing steps required to
prepare the dataset, and the subsection 3.3 illustrates the
reasonable configuration parameter for applying LCS.
columns. There are several columns which have a single
constant value which are removed in second step. Then
strongly-correlated columns are identified and only ones
in the training dataset are remained. The value (0.85) is
chosen as the threshold for high correlation in the third
step. There is a massive mismatch between the numbers of
satisfied customers (96%) versus unsatisfied ones (4%). In
forth step we balanced the two classes. Synthetic Minority
Over-sampling Technique (SMOTE) [15] is used for balancing the classes. SMOTE implementation is available in the
R package DMwR. The number of satisfied customers outnumber the unsatisfied ones by roughly a factor of 24.27.
After preprocessing steps the balanced dataset’s records
yield to 147,392, and the number of features yield to 143,
excluding ID and Target.
The last step is to convert all attribute values into binary
format, because the LCS implementation acts as rule-base
system (like other GBML systems) and has been coded to
handle binary values.
3.1 The dataset
3.3 LCS configuration
The dataset consists of 369 anonymized features, excluding the ID/target column. So a challenge with this dataset
is what each feature means—thus little domain knowledge or intuition is used.
The arbitrary configurations and their values are discussed.
ing general and accurate rules with implicit and explicit
pressures, encouraging maximal generality/simplicity
[17].
• Interpretable: LCS rules are logical IF:THEN statements,
interpretable to human [14].
3 Proposed method
3.2 The preprocessing steps
Figure 2 shows five sub-steps which applied in the preprocessing steps. The first step is to remove duplicate
Raw
dataset
Preprocessing
• Learning Iteration: is one of the most critical run param-
eters. In this case, LCS iterates over instances as twice
as the folded dataset size (23,826) which occurs two
epochs and generates more reliable rules [9].
• Maximum Population Size: must be specified by initial
trial and error, in this case maximum population size of
7000 is applied [9].
Run
LCS
Cleaned
datasets
QRF
applied
Result
Extraction
QRC
applied
Result
Extraction
PDRC
applied
Result
Extraction
Result
compared
Fig. 1 Proposed method steps
Raw
Dataset
Remove
identical
features
Fig. 2 Preprocessing steps
Vol:.(1234567890)
Remove
constant
features
Remove
highly
correlated
variables
Balanced
data set
Binary
Representation
Cleansed
Dataset
Case Study
SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1
• Cross Validation: The fivefold cross validation (CV) is
determined and serially per-formed a complete run,
then evaluated on each training and testing dataset to
have a better predictions.
• Attribute Tracking/Feedback: Attribute tracking (AT)
and Attribute feedback (AF) are used to guide the algorithm to more intelligently explore reliable a (...truncated)