Naive Bayes-Guided Bat Algorithm for Feature Selection
Naive Bayes-Guided Bat Algorithm for Feature Selection
Ahmed Majid Taha,1 Aida Mustapha,2 and Soong-Der Chen3
1College of Graduate Studies, Universiti Tenaga Nasional, 43000 Kajang, Selangor, Malaysia
2Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
3College of Information Technology, Universiti Tenaga Nasional, 43000 Kajang, Selangor, Malaysia
Received 30 September 2013; Accepted 14 November 2013
Academic Editors: Z. Cui and X. Yang
Copyright © 2013 Ahmed Majid Taha et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.
The motivations to perform feature selection in a classification experiment are two-fold. The first is to enhance the classifier performance by selecting only useful features and removing irrelevant, redundant, or noisy features. The second is to reduce the number of features when classification algorithms could not scale up to the size of feature set either in time or space. In general, feature selection consists of two essential steps, which are searching for desired subset using some search strategies and evaluating the subset produced. A search strategy in searching the feature subset could be exhaustive or approximate. While exhaustive search strategy evaluates all probabilities of the feature subset, approximate search strategy only generates high quality solutions with no guarantee of finding a global optimal solution .
One of the most prominent algorithms in exhaustive search is branch and bound method . Exhaustive search guarantees optimal solution but this method is not practical for even a medium-sized dataset as finding the optimal subset of features is an NP-hard problem . Fornumber of features, the number of possible solutions will be exponential to . Since exhaustive search is not practical, research effort and focus on search strategies have since shifted to metaheuristic algorithms, which are considered as a subclass of approximate methods. The literature has shown that the metaheuristic algorithms are capable of handling large-size problem instances with satisfactory solutions within a reasonable time [4–7].
After searching for feature subset, each candidate from the resulting subset generated needs to be evaluated based on some predetermined assessment criteria. There are three categories of feature subset evaluations depending on how the searching strategy is being associated with the classification model, whether as filter, wrapper, or embedded methods. These three categories will be explained in more detail in the next section.
Nonetheless, the main challenge in feature selection is to select the minimal subset of features with modicum or no loss of classification accuracy. While the literature has shown numerous developments in achieving this [8–10], the basis of comparison is rather limited in terms of number of features, classification accuracy, stability, or feature generalization in isolation. Generalization of the produced features is important to investigate their effect on the performance of different classifiers.
In view of this, the objectives of this paper are as follows: first to design a new hybrid algorithm that exploits a Naive Bayes algorithm to guide a Bat Algorithm, second to evaluate the performance of the proposed hybrid algorithm against other well-known feature selection algorithms, and third to test the effect of the resulting features in terms of generalization using three different classifiers. The remainder of this paper is organized as follows. Section 2 reviews the related works on searching and evaluating algorithms in feature selection. Section 3 details out the principles of Naive Bayes algorithms, Section 4 presents mechanics of the Bat Algorithm, and Section 5 introduces the proposed Naïve Bayes-guided Bat Algorithm for feature selection. Next, Section 6 describes the experimental settings, Section 7 discusses implications of the results, and, finally, Section 8 concludes with some recommendations for future work.
2. Related Work
The application of metaheuristics algorithms in searching the feature subset has shown high effectiveness as well as efficiency to solve complex and large problems in feature selection. In general, there are two categories of metaheuristic search algorithms: single solution-based metaheuristics (SBM) that manipulate and transform a single solution during the search and population-based metaheuristics (PBM) where a whole population of solutions is evolved. The simplest and oldest SBM method used in feature selection is Hill Climbing (HC) algorithm [1, 11]. This algorithm starts with a random initial solution and swaps the current solution with a neighboring solution in the following iteration in order to improve the quality of solution. Searching will stop only when all the neighboring candidate subsets are poorer than the current solution, which means that the algorithm will be most probably trapped in local optimum .
In order to overcome this problem, Simulated Annealing (SA) is proposed . SA accepts the worse moves that commensurate to the parameter determined at the initial stage, called the temperature, which is inversely proportional to the change of the fitness function. In more recent work, a modified SA algorithm called the Great Deluge Algorithm (GDA) is proposed  to provide a deterministic acceptance function of the neighboring solutions. Tabu Search (TS) also accepts nonimproving solutions to escape from local optima. TS stores information related to the search process, which is a list of all previous solutions or moves in what is termed as Tabu list [13, 14]. Nonetheless, SBM algorithms such as Hill Climbing and Simulated Annealing suffer from two major disadvantages. First, they often converge towards local optima and second they can be very sensitive to the initial solution .
The PBM methods have been equally explored in feature selection. Different from SBM, PBM represents an iterative improvement in a population of solutions that works as follows. Firstly, the population is initialized. Then, a new population of solutions is generated. Next, the new population is integrated into the existing one by using some selection procedures. The search process is terminated when a certain criterion is satisfied. The most prominent and oldest population-based solution used in feature selection is Genetic Algorithm (GA) [5, 15, 16]. The major roles in GA are the crossover and mutation operations used to couple solutions and to arbitrarily adjust the individual content, to boost diversity aiming to decrease the risk of sticking in local optima.
Another PBM algorithm is the Ant Colony Optimization (ACO), which takes form as a multiagent system, whereby the building unit of this system represents virtual ants as inspired by the behavior of real ants. In nature, a chemical trace called pheromone is left on the ground and is used to guide a group of ants heading for the target point since ants are not able to see very well [6, 17, 18]. Another nature-inspired algorithm is the Particle Swarm Optimization (PSO) algorithm that simulates the social behavior of natural creatures such as bird flocking and fish schooling to discover a place with adequate food [7, 19]. Scatter Search (SS) is another PBM method that recombines solutions elected from a reference set to generate other solutions by building an initial population satisfactory to the criteria of quality and diversity .
The next step in feature selection is evaluating the feature subset produced. The evaluation methods can be broadly classified into three categories. First, the filter approach or independent approach evaluates candidate solutions by depending on intrinsic characteristics of the features themselves, without considering any mining algorithm. Filter approach includes several types such as distance , information , dependency , or consistency . Second, the wrapper approach or dependent approach requires one predetermined learning model and selects features with the purpose of improving the generalization performance of that particular learning model . Although the wrapper approach is known to outperform the filter approach with regard to prediction accuracy , the method is time-consuming. Third, the embedded approach in feature evaluation attempts to capitalize on advantages of both approaches by implementing the diverse evaluation criteria in different search phases . By integrating the two approaches at different phases, the embedded approach is capable to achieve accuracy of a wrapper approach at the speed of a filter approach. Choosing an evaluation method for particular search method is a critical mission because the interaction between the evaluation method and the search strategy will affect the overall quality of solution.
3. Naive Bayes Algorithm
Naive Bayes (NB) algorithm is one of the most effective and efficient inductive learning algorithms for data mining along with machine learning. This algorithm belongs to the wrapper approach. NB is considered a simple classifier based on the classical statistical theory “Bayes theorem.” The Bayesian algorithm is branded “naïve” because it is founded on Bayes Rule, which has a strong supposition that the features are conditionally independent from each other with regard to the class . In the literature, the NB algorithm has proven its effectiveness in various domains such as text classification , improving search engine quality , image processing [30, 31], fault prediction , and medical diagnoses .
Naive Bayes classifier works as follows: letbe a vector of random variables denoting the observed attribute values in the training setto certain class labelin the training set. The probability of each class given the vector of observed values for the predictive attributes can be computed using the following formula: whereis the prior probability of class andis the class conditional probability density functions. Basically put, the conditional independence assumption assumes that each variable in the dataset is conditionally independent of the other. This is simple to compute for test cases and to estimate from training data as follows: where is the value of the th attribute in andis the number of attributes. Let be the number of classes, andis the th class; the probability distribution over the set of features is calculated using the following equation:
Effectiveness of Naive Bayes algorithm in classification and learning is attributed to several characteristics such as the following .(i)High computational efficiency as compared to other wrapper methods because it is inexpensive since it is considered linear time complexity classifier.(ii)Low variance due to less searching.(iii)Incremental learning because NB functions work from approximation of low-order probabilities that are deduced from the training data. Hence, these can be quickly updated as new training data are obtained.(iv)High capability to handle noise in the dataset.(v)High capability to handle missing values in the dataset.
Furthermore, NB implementation has no required adjusting parameters or domain knowledge. The major drawback of NB only lies in the assumption of features independence . Despite this, NB often delivers competitive classification accuracy and is widely applied in practice especially as benchmark results. Good survey on the variety of adaptations to NB in the literature can be found in .
4. Bat Algorithm
The idea of the Bat Algorithm (BA) is to mimic the behaviors of bats when catching their prey. BA was first presented in  and it was found to outperform Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) in evaluation using benchmark functions. BA has also been successfully applied to tough optimization problems such as motor wheel optimization problem , clustering problem , global engineering optimization, and constrained optimization tasks [37–40]. Very recently, two versions of bat-inspired algorithms have been proposed for feature selection [41, 42]. The implementation of BA is more complicated than many other meta-heuristic algorithms  because each agent (bat) is assigned a set of interacting parameters such as position, velocity, pulse rate, loudness, and frequencies. This interaction affects the quality of a solution and the time needed to obtain such solution.
The principle of bat algorithm is as follows. A swarm of bats is assumed to fly randomly with velocity at position with a fixed frequency, varying wavelength, and loudnessto search for a prey. They have the capability to adjust the wavelength of their emitted pulses and regulate the rate of pulse emission, which is important to determine their closeness of the target. Although the loudness can be different in many ways, the loudness differs from a large (positive)to a minimum constant value. The frequencyis in the range that corresponds to a range of wavelengths . For example, a frequency range of [20 kHz, 500 kHz] corresponds to a range of wavelengths from 0.7 mm to 17 mm.
5. Proposed Naive Bayes-Guided Bat Algorithm 5.1. Frequency
Frequency in the proposed algorithm is represented as a real number as defined in (4). The choice of minimum and maximum frequency depends on the application domain, where is a random number range between 0 and 1. Frequency also affects the velocity as shown in (5). Consider the following:
The velocity of each bat is represented as a positive integer number. Velocity suggests the number of bat attributes that should change at a certain moment of time. The bats communicate with each other through the global best solution and move towards the global best position (solution). The following equation shows the formula for velocity: whererefers to the difference between the length of global best bat and the length of the th bat. When the difference is positive, this means that the global best bat has more features than those of the th bat. When the result is summed with the previous velocity, it will accelerate the th bat towards the global best bat. If the difference is negative, this means that the ith bat has more features than those of the global best bat. Therefore, when the output is summed with the previous velocity, it will decrease the velocity of ith bat and help to attract it closer to global best bat. In the proposed Bat Algorithm-Naive Bayes (BANB) algorithm, the maximum velocity was setting () equal to , whereis the number of features. In the proposed BANB, (2) is used to adjust the velocity during each iteration; therefore, the proposed algorithm is adaptive for feature selection problem in order to mimic the original algorithm behavior. Velocity representation is also one major difference between BANB and the Binary Bat Algorithm (BBA) . In BBA, the velocity is calculated for each single feature; hence, the algorithm is more time-consuming and departing from the original algorithm attitude. On the contrary, the velocity in the proposed BANB is calculated once for the entire solution; hence, the velocity amount determines the piece of change.
5.3. Position Adjustment
In the proposed algorithm, each bat position is formulated as a binary string of length , whereis the total number of features. Each feature is represented by bit, where “1” means that the corresponding feature is selected and the “0” means that it is not selected. The positions are categorized into two groups according to the bit difference between the th bat and the global best bat in order to align exploitation and exploration during searching.
The bat’s position is adjusted depending on one of the following conditions. In the case where the velocity of th bat is lower or equal to the number of different bits, th bat will copy some features from global best bat, thus moving towards global best bat, while still exploring new search space. In the case where the velocities of th bat are higher than the velocity of global best bat, then the th bat will import all features from the global best bat to be the same as the global best bat with a few different bits to facilitate further exploitation. The following equation shows the position adjustment, whereis bat position, andis the velocity of the th bat at time :
Loudness in the proposed algorithm is represented as the change in number of features at certain time during local search around the global best bat, as well as local search around the th bat. The formula for loudness is shown in (7), whereis the average loudness of all the bats at certain iteration and. The value for sound loudness () ranges between the maximum loudness and minimum loudness. Consider the following:
Generally, the loudness value will decrease when the bat starts approaching the best solution. The following equation shows that the amount of decrease is determined by :
The value for sound loudness also plays an important role in obtaining good quality solutions within a reasonable amount of time. The choice of the maximum and minimum loudness depends on the domain of application and also the size of the dataset. In the proposed BANB algorithm, the maximum loudness has been determined empirically as , where is number of features. Value for maximum loudness is dynamic depending on number of features in certain dataset. For example, whenand, the bat begins to reduce the number of features from 3 features to 2 features and the value then becomes a single feature when it gets closer to the target.
5.5. Pulse Rate
Pulse rate has the role to decide whether a local search around the global best bat solution should be skipped or otherwise. Higher pulse rate will reduce the probability of conducting a local search around the global best and vice versa. Therefore, when the bat approaches the best solution, pulse rate value will increase and subsequently reduce the chances to conduct a local search around the global best. The amount of increase is determined byas defined in the following:
5.6. Fitness Function
Each candidate solution is using a fitness function defined in (10), whereis the classification accuracy, TF is the total number of all features, and SF is the number of selected features.and are two parameters corresponding to the weight of classification accuracy and subset length, whereand . From (10), we can see that the importance of classification accuracy and subset size is weighted differently. Generally, classification accuracy is given more weight than the size of the subset. In this experiment, the two parameters have been set as follows:, . Consider the following:
The complete algorithm for the proposed hybrid BA guided by Naive Bayes classifier (BANB) is shown in Algorithm 1.
Algorithm 1: BANB Algorithm.
6. Experiments and Results
The objective of the experiments is to evaluate the performance of the proposed Naive Bayes-guided Bat Algorithm (BANB) in terms of number of features selected and the classification accuracy achieved. To achieve this objective, we compared the number of features and classification accuracies of BANB with several well-known algorithms, which are Genetic Algorithms (GA) , Particle Swarm Optimization (PSO) , and Geometric Particle Swarm Optimization (GPSO) . Similar to the proposed BANB, we also used Naive Bayes classifier for all comparative algorithms as the attribute evaluator. However, the parameters for the algorithms had the same settings as those used by the original authors. For the proposed algorithm, the parameters were set to the following values: population size = 25 and decrease sound loudness and increase pulse rate both are set to 0.6. The initial value of pulse rate is equal to 0.2. The proposed BANB algorithm and other algorithms were run for 20 times with different initial solutions. Following [4, 17], all the algorithms were terminated after 250 iterations.
6.1. Description of Dataset
For the experiments, twelve datasets were considered to cover both cases of binary and multiclass data. Three of the datasets, namely, M-of-N, Exactly, and Exactly2, were sourced from . M-of-N is an artificial binary class since the decision attribute consisting of two class labels and the dataset were generated from a uniform distribution to create the artificial domain. Exactly and Exactly2 are artificial binary classification datasets, generated based on x-of-y concepts, which are not linearly separable and are known to be difficult for many classes of learning algorithms . The remaining datasets were taken from the UCI data repository . The datasets are Vote, Credit, LED, Derm, Derm2, Lung, WQ, Heart, and Mushroom. Vote is widely used as a binary classification dataset in the literature. The dataset represents votes for each of the U.S. House of Representatives congressmen with the class label democrat and republican. Credit dataset is a binary classification data that is concerned with credit card applications. LED dataset in display domain is a multiclass classification data as the class label includes ten possible values in which the first seven features determine the class label of a pattern, whilst the rest of the 17 features are irrelevant.
Derm and Derm2 represent real data in dermatology concerning differential diagnosis of erythematosquamous diseases. The class labels contain six values, which refer to six different diseases. Lung dataset is the pathological types of Lung cancer that aims to demonstrate the power of the optimal discriminant plane even in ill-posed settings . WQ is a multiclass label dataset that originated from the daily measures of sensors in an urban waste water treatment plant. The idea is to categorize the operational state of the plant with the purpose of predicting faults out of the state variables of the plant at each of the phases in the water treatment procedure. Heart is a binary class data that contains 76 attributes although all the published experiments reference to using only 14 of the original attributes. This data has been used to predict heart diseases, whereby the class label of zero and one refers to the absence or existence of heart disease in the patient. Finally, Mushroom is a binary class dataset that includes characterization of hypothetical samples identical to 23 types of gilled mushrooms in the Agaricus and Lepiota family. Table 1 shows the characteristics of the datasets.
Table 1: Dataset characteristics.
6.2. Results for Feature Selection Experiment
In this experiment, we compared the proposed BANB against GA , PSO , and GPSO  in terms of the number of features selected from the original dataset. Table 2 provides the comparison results. The number of features obtained from the comparative algorithms in Table 2, and the best results are highlighted in bold. Then the results are statistically tested using two tests, Kolmogorov-Smirnov and Levene test . However, the Kolmogorov-Smirnov and Levene test did not meet the assumptions of normality distribution and equality of variance which then led us to use Wilcoxon test. Essentially, this test is an alternative to the paired t-test, when the assumption of normality or equality of variance is not met . Wilcoxon test is rated to be a robust estimate tool that depended on the rank estimation . Table 3 presents Wilcoxon test results for the proposed BANB algorithm against other feature selection algorithms. From Table 3, between the brackets refer to the algorithm that performs better than another algorithm. The results of Wilcoxon test are considered to be statistically significant atless than 0.05 and are highly significant atless than 0.01.
Table 2: Average of selected features.
Table 3: Wilcoxon test results.
6.3. Results for Classification Accuracy Experiment
The second part of the experiment was to evaluate and compare the average classification accuracies achieved by BANB and other comparative algorithms over 10 runs, using 10-fold cross-validation method. Three well-known classifiers were employed for the purpose of evaluating the resulting subsets among different classifiers, which were JRip PART and J48 . Tables 4, 5, and 6 show the average classification accuracy and standard deviation values from the experiment.
Table 4: Average classification accuracy with standard deviation for JRip.
Table 5: Average classification accuracy with standard deviation for PART.
Table 6: Average classification accuracy and standard deviation for J48.
In selecting the feature subset, Table 2 shows that the proposed BANB algorithm obtained the smallest number of features across all datasets except for LED. Table 3 confirmed that the difference between BANB and the remaining comparative algorithms is highly significant except for LED and M-of-N datasets. More significantly, BANB is able to reduce the number of features up to a single feature in five datasets as shown in Table 2. In evaluating the feature subset, if we take into consideration the interaction between classification accuracy and number of features selected by the proposed BANB algorithm as compared to other algorithms, we can categorize the results into three cases. In the first case, a reduced number of features deliver the same classification accuracy. This is shown in the Exactly dataset that produced similar classification accuracy in both JRip and J48 classifiers and even higher accuracy in PART classifier. On the contrary, features selected by other algorithms included more features, which indicate that some of the features selected are redundant. This can be seen clearly in the Exactly2 dataset when all solutions achieved exactly the same accuracy in spite of variance in the number of selected features.
In the second case, the proposed algorithm reduced the number of features while at the same time increased the classification accuracy. For example, BANB selected only two features from the Lung dataset as opposed to additional eight features among other algorithms. The difference between the numbers of features selected is attributed to noisy features, which cause a decrease in classification accuracy such as in the Vote dataset. In the third case, smaller feature subset that is selected delivers a slightly lower classification accuracy, such as in Heart and Mushroom dataset with the exception of LED dataset. All algorithms could deliver the same accuracy with the same number of features because the LED dataset contains very protruding features.
Finally, it can be noted from Tables 4, 5, and 6 that the classification accuracies achieved by the proposed BANB algorithm are in less disagreement or very close across three different classifiers. This can be noted obviously from the experimental results using Exactly, Credit, Lung, and Derm datasets. To support this finding, we calculated standard deviation for each dataset over the three different classifiers and we averaged the values for each algorithm. The results were as follows: BANB equals 0.36, GPSO equals 0.99, PSO equals 1.04, and, finally, GA equals 1.11. This implies that the proposed feature selection algorithm BANB has better generalization as compared to other feature selection algorithms. Results from Table 2 also show that BANB is capable of selecting the same number of features for 9 out of 12 datasets over 20 iterations. This is followed by GA, GPSO, and, finally, PSO. It can be noted that the standard deviation values in Tables 4, 5, and 6 are zeros for 9 datasets. This means that our proposed BANB could obtain certain number of features with exactly the same features for each iteration. As a consequence, BANB showed the highest stability among all comparative algorithms.
In this paper, a new hybrid feature selection algorithm has been presented. The Bat Algorithm employed Naïve Bayes Algorithm to intelligently select the most convenient feature that could maximize the classification accuracy while ignoring redundant and noisy features. We compared our proposed algorithm with three other algorithms using twelve well-known UCI datasets. The performance was evaluated from four perspectives, which are the number of features, classification accuracy, stability, and generalization. From the experiments, we can conclude that the proposed Naïve Bayes-guided Bat Algorithm (BANB) outperformed other meta-heuristic algorithms with a selection of feature subsets that are significantly smaller with a less number of features. In terms of classification accuracy, BANB has proven to achieve equal, if not better results as compared to other algorithms. For stability, the proposed algorithm is more stable than other algorithms. Finally, from the perspective of generalization of results, the resulting features produced by BANB are also more general than other algorithms in practice. For future work, further investigations are required to observe the behavior of the proposed algorithm in gene expression and very high-dimensional datasets.
References E. G. Talbi, Metaheuristics: From Design to Implementation, John Wiley and Sons, 2009. B. Yu and B. Yuan, “A more efficient branch and bound algorithm for feature selection,” Pattern Recognition, vol. 26, no. 6, pp. 883–889, 1993. View at Publisher · View at Google Scholar · View at ScopusE. Amaldi and V. Kann, “On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems,” Theoretical Computer Science, vol. 209, no. 1-2, pp. 237–260, 1998. View at Google Scholar · View at ScopusR. Jensen and Q. Shen, “Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 12, pp. 1457–1471, 2004. View at Publisher · View at Google Scholar · View at ScopusJ. Yang and V. Honavar, “Feature subset selection using genetic algorithm,” IEEE Intelligent Systems and Their Applications, vol. 13, no. 2, pp. 44–48, 1998. View at Publisher · View at Google Scholar · View at ScopusL. Ke, Z. Feng, and Z. Ren, “An efficient ant colony optimization approach to attribute reduction in rough set theory,” Pattern Recognition Letters, vol. 29, no. 9, pp. 1351–1357, 2008. View at Publisher · View at Google Scholar · View at ScopusX. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen, “Feature selection based on rough sets and particle swarm optimization,” Pattern Recognition Letters, vol. 28, no. 4, pp. 459–471, 2007. View at Publisher · View at Google Scholar · View at ScopusM. Anbarasi, E. Anupriya, and N. Iyengar, “Enhanced prediction of heart disease with feature subset selection using genetic algorithm,” International Journal of Engineering Science and Technology, vol. 2, pp. 5370–5376, 2010. View at Google ScholarH. Vafaie and I. F. Imam, “Feature selection methods: genetic algorithms vs greedy-like search,” in Proceedings of the International Conference on Fuzzy and Intelligent Control Systems Louisville, 1994. J. C. W. Debuse and V. J. Rayward-Smith, “Feature subset selection within a simulated annealing data mining algorithm,” Journal of Intelligent Information Systems, vol. 9, no. 1, pp. 57–81, 1997. View at Google Scholar · View at ScopusR. Caruana and D. Freitag, “Greedy attribute selection,” in Proceedings of the 11th International Conference on Machine Learning, pp. 28–36. N. S. Jaddi and S. Abdullah, “Nonlinear great deluge algorithm for rough set attribute reduction,” Journal of Information Science and Engineering, vol. 29, pp. 49–62, 2013. View at Google ScholarM. A. Tahir, A. Bouridane, and F. Kurugollu, “Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier,” Pattern Recognition Letters, vol. 28, no. 4, pp. 438–446, 2007. View at Publisher · View at Google Scholar · View at ScopusA.-R. Hedar, J. Wang, and M. Fukushima, “Tabu search for attribute reduction in rough set theory,” Soft Computing, vol. 12, no. 9, pp. 909–918, 2008. View at Publisher · View at Google Scholar · View at ScopusA. Jain and D. Zongker, “Feature selection: evaluation, application, and small sample performance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153–158, 1997. View at Publisher · View at Google Scholar · View at ScopusR. Leardi, “Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection,” Journal of Chemometrics, vol. 8, pp. 65–79, 1994. View at Google ScholarR. Jensen and Q. Shen, “Finding rough set reducts with Ant colony optimization,” in Proceedings of the UK Workshop on Computational Intelligence, pp. 15–22, 2003. R. K. Sivagaminathan and S. Ramakrishnan, “A hybrid approach for feature subset selection using neural networks and ant colony optimization,” Expert Systems with Applications, vol. 33, no. 1, pp. 49–60, 2007. View at Publisher · View at Google Scholar · View at ScopusE.-G. Talbi, L. Jourdan, J. García-Nieto, and E. Alba, “Comparison of population based metaheuristics for feature selection: application to microarray data classification,” in Proceedings of the 6th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA '08), pp. 45–52, April 2008. View at Publisher · View at Google Scholar · View at ScopusJ. Wang, A.-R. Hedar, G. Zheng, and S. Wang, “Scatter search for rough set attribute reduction,” in Proceedings of the International Joint Conference on Computational Sciences and Optimization (CSO '09), pp. 531–535, chn, April 2009. View at Publisher · View at Google Scholar · View at ScopusL. Bobrowski, “Feature selection based on some homogeneity coefficient,” in Proceedings of the 9th International Conference on Pattern Recognition, pp. 544–546, 1988. D. Koller and M. Sahami, Toward Optimal Feature Selection, 1996. M. Modrzejewski, “Feature selection using rough sets theory,” in Machine Learning: ECML-93, pp. 213–226, Springer, 1993. View at Google ScholarH. Liu and R. Setiono, “A probabilistic approach to feature selection: a filter solution,” in Machine Learning, pp. 319–327, 1996. View at Google ScholarH. Liu and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005. View at Publisher · View at Google Scholar · View at ScopusE. P. Xing, M. I. Jordan, and R. M. Karp, “Feature selection for high-dimensional genomic microarray data,” in Machine Learning, pp. 601–608, 2001. View at Google ScholarG. I. Webb, “Naïve bayes,” in Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, Eds., pp. 713–714, Springer, New York, NY, USA, 2010. View at Google ScholarG. Feng, J. Guo, B.-Y. Jing, and L. Hao, “A Bayesian feature selection paradigm for text classification,” Information Processing and Management, vol. 48, no. 2, pp. 283–302, 2012. View at Publisher · View at Google Scholar · View at ScopusH. T. T. Nguyen and D. K. Le, “An approach to improving quality of crawlers using naïve bayes for classifier and hyperlink filter,” in Computational Collective Intelligence. Technologies and Applications, N. T. Nguyen, K. Hoang, and P. Jȩdrzejowicz, Eds., pp. 525–535, Springer, Berlin, Germany, 2012. View at Google ScholarZ. Zhang, Q. Zhu, and Y. Xie, “A novel image matting approach based on naive bayes classifier,” in Intelligent Computing Technology, D. S. Huang, C. Jiang, V. Bevilacqua, and J. Figueroa, Eds., pp. 433–441, Springer, Berlin, Germany, 2012. View at Google ScholarM. C. Yang, C. S. Huang, J. H. Chen, and R. F. Chang, “Whole breast lesion detection using naive bayes classifier for portable ultrasound,” Ultrasound in Medicine and Biology, vol. 38, pp. 1870–1880, 2012. View at Google ScholarC. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Information Sciences, vol. 179, no. 8, pp. 1040–1058, 2009. View at Publisher · View at Google Scholar · View at ScopusZ. Hoare, “Landscapes of Naïve Bayes classifiers,” Pattern Analysis and Applications, vol. 11, no. 1, pp. 59–72, 2008. View at Publisher · View at Google Scholar · View at ScopusX. S. Yang, “A new metaheuristic bat-inspired algorithm,” in Nature Inspired Cooperative Strategies For Optimization (NICSO' 10), J. González, D. Pelta, C. Cruz, G. Terrazas, and N. Krasnogor, Eds., pp. 65–74, Springer, Berlin, Germany, 2010. View at Google ScholarT. C. Bora, L. D. S. Coelho, and L. Lebensztajn, “Bat-inspired optimization approach for the brushless DC wheel motor problem,” IEEE Transactions on Magnetics, vol. 48, no. 2, pp. 947–950, 2012. View at Publisher · View at Google Scholar · View at ScopusK. Khan, A. Nikov, and A. Sahai, “A fuzzy bat clustering method for ergonomic screening of office workplaces,” in Proceedings of the 3rd International Conference on Software, Services and Semantic Technologies (S3T '11), D. Dicheva, Z. Markov, and E. Stefanova, Eds., pp. 59–66, Springer, Berlin, Germany, 2011. X. S. Yang and A. H. Gandomi, “Bat algorithm: a novel approach for global engineering optimization,” Engineering Computations, vol. 29, no. 5, 2012. View at Google ScholarA. H. Gandomi, X. S. Yang, A. H. Alavi, and S. Talatahari, “Bat algorithm for constrained optimization tasks,” Neural Computing and Applications, vol. 22, pp. 1239–1255, 2012. View at Google ScholarI. Fister Jr., D. Fister, and X. S. Yang, “A hybrid bat algorithm,” Elektrotehniski Vestnik, vol. 80, pp. 1–7, 2013. View at Google ScholarX. S. Yang, “Bat algorithm for multi-objective optimisation,” International Journal of Bio-Inspired Computation, vol. 3, pp. 267–274, 2011. View at Google ScholarA. M. Taha and A. Y. C. Tang, “Bat algorithm for rough set attribute reduction,” Journal of Theoretical and Applied Information Technology, vol. 51, no. 1, 2013. View at Google ScholarR. Nakamura, L. Pereira, K. Costa, D. Rodrigues, J. Papa, and X. S. Yang, “BBA: a binary bat algorithm for feature selection,” in Proceedings of the 25th Conference on Graphics, Patterns and Images (SIBGRAPI '12), pp. 291–297, 2012. R. S. Parpinelli and H. S. Lopes, “New inspirations in swarm intelligence: a survey,” International Journal of Bio-Inspired Computation, vol. 3, pp. 1–16, 2011. View at Google ScholarD. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, 1989. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, 1998. A. Moraglio, C. Di Chio, and R. Poli, “Geometric particle swarm optimisation,” in Proceedings of the 10th European Conference on Genetic Programming (EuroGP '07), pp. 125–136, April 2007. View at ScopusB. Raman and T. R. Ioerger, Instance Based Filter for Feature Selection, 2002. C. L. Blake and C. J. Merz, UCI Repository of Machine Learning Databases, University of California, Irvine, Calif, USA, 1998. Z.-Q. Hong and J.-Y. Yang, “Optimal discriminant plane for a small number of samples and design method of classifier on the plane,” Pattern Recognition, vol. 24, no. 4, pp. 317–324, 1991. View at Publisher · View at Google Scholar · View at ScopusH. W. Lilliefors, “On the Kolmogorov-Smirnov test for normality with mean and variance unknown,” Journal of the American Statistical Association, vol. 62, pp. 399–402, 1967. View at Google ScholarF. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, pp. 80–83, 1945. View at Google ScholarF. Monti, R. Dell’Anna, A. Sanson, M. Fasoli, M. Pezzotti, and S. Zenoni, “A multivariate statistical analysis approach to highlight molecular processes in plant cell walls through ATR FT-IR microspectroscopy: the role of the a-expansin PhEXPA1 in Petunia hybrida,” Vibrational Spectroscopy, vol. 65, pp. 36–43, 2013. View at Google ScholarQ. Duan, D. Miao, H. Zhang, and J. Zheng, “Personalized web retrieval based on rough-fuzzy method,” Journal of Computational Information Systems, vol. 3, no. 3, pp. 1067–1074, 2007. View at Google Scholar · View at Scopus