Customer satisfaction prediction with Michigan-style learning classifier system (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs42452-019-1493-1.pdf

Customer satisfaction prediction with Michigan-style learning classifier system

Case Study Customer satisfaction prediction with Michigan‑style learning classifier system Keivan Borna1 · Shokoofeh Hoseini2 · Mohammad Ali Mehdi Aghaei3 © The Author(s) 2019 OPEN Abstract Many different classification algorithms can be use in order to analyze, classify and predict data. Learning classifier system (LCS) which is known as a genetic base machine learning system, combines the machine learning with evolutionary computing and other heuristics to produce an adaptive system that learns to solve a particular problem. This paper uses the Michigan style LCS, in the context of bank customer satisfaction to classify customers into two different groups: unsatisfied/satisfied customers. Three different Rule Compaction strategies are used to compare the rule population’s accuracy and micro/macro population size. The result specifies features that mostly influence prediction. Keywords Learning classifier system · Evolutionary algorithm · Michigan style LCS · Machine learning · Prediction 1 Introduction 2 Why using LCS? [Learning] Classifier Systems (LCSs) [1, 2] are a kind of Rule-Based system (RBS) [3, 4] with general mechanism for parallel rule processing, adaptive generation of new rules, and testing the effectiveness of existing rules. These mechanisms approach to more reliable without “brittleness” learning systems in AI. For a further understanding of what is the LCS see [1, 5, 6]. This paper indicates the reason of using LCS as a Genetic Base Machine Learning (GBML) [7, 8] system for prediction. A preprocessing step is required to prepare dataset. Experimental results are conducted by applying three Rule Compaction algorithms [9, 10] on a dataset which consists of customer’s satisfaction information in Santander Bank [11]. Section 2 indicates the eagerness of using LCS. The proposed method is presented in Sect. 3 and the concept of Rule Compaction and their algorithm is presented in Sect. 4, experimental results and evaluation are discussed in Sect. 5, and finally Sect. 6 is devoted to the conclusions. LCS algorithms in general, constitute a unique alternative to other well-known machine learning strategies that follow the classic paradigm of seeking to identify a ‘best’ model that can individually be applied to the entire dataset. There are a lot of LCS implementation [12] that causes prediction/classification. Here are the advantages that encourage us to use LCS [13, 17]. • Model free: They make limited assumptions about the environment, or the patterns of association within the data [17]. • Ensemble Learner: is to build a predictive learning systems by integrating multiple learner to improve the performance and accuracy. Majority Voting and averaging are two of the applicable ensemble methods [17]. • Stochastic Learner are Non-deterministic learning with advantage in large-scale or high complexity in compare with deterministic. * Keivan Borna, ; Shokoofeh Hoseini, ; Mohammad Ali Mehdi Aghaei, | 1Department of Computer Science, Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran. 2Department of IT Management, Faculty of Management, Kharazmi University, Tehran, Iran. 3Department of IT Management, Research and Science Branch Islamic Azad University, Tehran, Iran. SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1 Received: 25 May 2019 / Accepted: 12 October 2019 / Published online: 21 October 2019 Vol.:(0123456789) Case Study SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1 • Implicitly Multi-objective: is a characteristics of obtain- Figure 1 shows the proposed method phases. Starting from preprocessing the raw dataset, then applying three rule compaction strategies separately on the processed dataset. After obtaining the predicted results, a comprehensive evaluation is investigated and presented in Sect. 5, while the subsection 3.1 discusses the dataset used, subsection 3.2 presents the preprocessing steps required to prepare the dataset, and the subsection 3.3 illustrates the reasonable configuration parameter for applying LCS. columns. There are several columns which have a single constant value which are removed in second step. Then strongly-correlated columns are identified and only ones in the training dataset are remained. The value (0.85) is chosen as the threshold for high correlation in the third step. There is a massive mismatch between the numbers of satisfied customers (96%) versus unsatisfied ones (4%). In forth step we balanced the two classes. Synthetic Minority Over-sampling Technique (SMOTE) [15] is used for balancing the classes. SMOTE implementation is available in the R package DMwR. The number of satisfied customers outnumber the unsatisfied ones by roughly a factor of 24.27. After preprocessing steps the balanced dataset’s records yield to 147,392, and the number of features yield to 143, excluding ID and Target. The last step is to convert all attribute values into binary format, because the LCS implementation acts as rule-base system (like other GBML systems) and has been coded to handle binary values. 3.1 The dataset 3.3 LCS configuration The dataset consists of 369 anonymized features, excluding the ID/target column. So a challenge with this dataset is what each feature means—thus little domain knowledge or intuition is used. The arbitrary configurations and their values are discussed. ing general and accurate rules with implicit and explicit pressures, encouraging maximal generality/simplicity [17]. • Interpretable: LCS rules are logical IF:THEN statements, interpretable to human [14]. 3 Proposed method 3.2 The preprocessing steps Figure 2 shows five sub-steps which applied in the preprocessing steps. The first step is to remove duplicate Raw dataset Preprocessing • Learning Iteration: is one of the most critical run param- eters. In this case, LCS iterates over instances as twice as the folded dataset size (23,826) which occurs two epochs and generates more reliable rules [9]. • Maximum Population Size: must be specified by initial trial and error, in this case maximum population size of 7000 is applied [9]. Run LCS Cleaned datasets QRF applied Result Extraction QRC applied Result Extraction PDRC applied Result Extraction Result compared Fig. 1 Proposed method steps Raw Dataset Remove identical features Fig. 2 Preprocessing steps Vol:.(1234567890) Remove constant features Remove highly correlated variables Balanced data set Binary Representation Cleansed Dataset Case Study SN Applied Sciences (2019) 1:1450 | https://doi.org/10.1007/s42452-019-1493-1 • Cross Validation: The fivefold cross validation (CV) is determined and serially per-formed a complete run, then evaluated on each training and testing dataset to have a better predictions. • Attribute Tracking/Feedback: Attribute tracking (AT) and Attribute feedback (AF) are used to guide the algorithm to more intelligently explore reliable a (...truncated)