SIP-FS: a novel feature selection for data representation

EURASIP Journal on Image and Video Processing, Feb 2018

Multiple features are widely used to characterize real-world datasets. It is desirable to select leading features with stability and interpretability from a set of distinct features for a comprehensive data description. However, most of existing feature selection methods focus on the predictability (e.g., prediction accuracy) of selected results yet neglect stability. To obtain compact data representation, a novel feature selection method is proposed to improve stability, and interpretability without sacrificing predictability (SIP-FS). Instead of mutual information, generalized correlation is adopted in minimal redundancy maximal relevance to measure the relation between different feature types. Several feature types (each contains a certain number of features) can then be selected and evaluated quantitatively to determine what types contribute to a specific class, thereby enhancing the so-called interpretability of features. Moreover, stability is introduced in the criterion of SIP-FS to obtain consistent results of ranking. We conduct experiments on three publicly available datasets using one-versus-all strategy to select class-specific features. The experiments illustrate that SIP-FS achieves significant performance improvements in terms of stability and interpretability with desirable prediction accuracy and indicates advantages over several state-of-the-art approaches.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1186%2Fs13640-018-0252-3.pdf

SIP-FS: a novel feature selection for data representation

Guo et al. EURASIP Journal on Image and Video Processing SIP-FS: a novel feature selection for data representation Yiyou Guo 0 Jinsheng Ji 0 Hong Huo 0 Tao Fang 0 Deren Li 1 0 Department of Automation, Shanghai Jiao Tong University , Dongchuan Road, Shanghai , China 1 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University , Luoyu Road, 430079 Wuhan , China Multiple features are widely used to characterize real-world datasets. It is desirable to select leading features with stability and interpretability from a set of distinct features for a comprehensive data description. However, most of existing feature selection methods focus on the predictability (e.g., prediction accuracy) of selected results yet neglect stability. To obtain compact data representation, a novel feature selection method is proposed to improve stability, and interpretability without sacrificing predictability (SIP-FS). Instead of mutual information, generalized correlation is adopted in minimal redundancy maximal relevance to measure the relation between different feature types. Several feature types (each contains a certain number of features) can then be selected and evaluated quantitatively to determine what types contribute to a specific class, thereby enhancing the so-called interpretability of features. Moreover, stability is introduced in the criterion of SIP-FS to obtain consistent results of ranking. We conduct experiments on three publicly available datasets using one-versus-all strategy to select class-specific features. The experiments illustrate that SIP-FS achieves significant performance improvements in terms of stability and interpretability with desirable prediction accuracy and indicates advantages over several state-of-the-art approaches. Data representation; Interpretability; Predictability; Stability 1 Introduction Nowadays, massive amounts of image data are available in our daily life, including web images and remote sensing images. Numerous features have been proposed to characterize an image, such as global features (color, GIST, shape, and texture) and local features (shape context, and histograms of oriented gradients). For texture feature, the total number of texture features is up to 30 types, such as local binary pattern (LBP) [ 1 ] and Gabor textures [ 2 ]. For color feature, there also exist several types, such as color histogram and color correlogram. Generally, images are always described by multiple features which are complementary to each other, thus selecting effective feature subset from a set of distinct features is a great challenge for data representation [ 3 ]. To handle this challenge, feature selection [ 4–8 ] and subspace learning [ 9, 10 ] have been developed to obtain suitable feature representations. Feature selection is commonly used as a preprocessing step for classification, so most feature selection algorithms are only designed for better predictability, such as high prediction accuracy. Although many feature selections have taken both feature relevance and redundancy into account simultaneously for predictability [11], they neglect stability [ 12 ]. If a feature selection method has poor stability, the selected feature subsets change significantly due to the variation of training data. Therefore, using only predictability to evaluate feature selection methods may result in inconsistent results of ranking for data representation. On the other hand, each feature type describes image from a single cue and has its own specific propertyand domain-specific meaning. Different from a scalar feature, feature types, which can be scalars, vectors, or matrices, are highly diverse in dimension and expression. However, existing methods simply ensemble the selection of each feature type [ 13 ] or concatenate all features types into a single vector [ 14 ]. These methods ignore the relation between different feature types. Moreover, they often select a common feature subset for all classes, while the feature subset might not be optimal for each class. According to ref. [ 14 ], one-versus-all strategy is employed to select class-specific features. Feature selection selects a subset from original features rather than obtain a low-dimensional subspace, thereby maintaining the physical meaning, which is beneficial for understanding of data [ 4 ]. Therefore, how to select a set of feature types and evaluate the contribution of these types for a specific class is critical for enhancing their interpretability of features. To address the above-mentioned issues, a novel feature selection method is proposed to improve stability and interpretability without sacrificing predictability, which is the so-called SIP-FS. The main contributions of this paper are as follows. First, generalized correlation rather than mutual information is employed in minimal redundancy maximal relevance to determine what feature types contribute to a specific class, thereby enhancing the interpretability of features. Second, stability constraint is adopted in SIP-FS to select consistent results of ranking in the case of data variation. The remainder of this paper is organized as follows. Section 2 presents the related work of feature selection including predictability, interpretability, and stability. Section 3 illustrates the proposed methodology and other feature selection methods using different criteria based on predictability, stability and interpretability. SIP-FS is presented in Section 4. Section 5 discusses the effects of parameters and performance comparisons of different methods. Finally, Section 6 concludes this paper 2 Related work of feature selection 2.1 Predictability As an important technique for handling high-dimensional data, feature selection plays an important role in pattern recognition and machine learning. It can be divided into four categories: filter, wrapper, embedded, and hybrid methods [ 4 ]. In this study, we focus on the filter methods based on different evaluation measures, such as distance criterion (Relief and its variants ReliefF, IRelief [ 15 ]), separability criterion (Fisher Score [ 16 ]), correlation coefficient [ 17 ], consistency [ 18 ], and mutual information [ 11 ]. More details can refer to ref. [ 19 ]. In general, oneversus-all strategy is becoming increasingly used in feature selection methods to select class-specific features for a certain class rather than a common feature subset for all classes [ 14 ]. 2.2 Interpretability Most existing feature selection methods focus on predictability (e.g., prediction accuracy) without considering the correlation between different feature types, weakening the interpretability of selected results. However, different feature types exhibit various information, including statistical characteristics and domain-specific meanings. Given a set of distinct feature types, it remains unclear what feature types contribute to a specific class. Haur et al. analyze the influence of feature selection methods on functional interpretability of the signatures [ 20 ]. Li et al. utilize association rule mining algorithms to improve the interpretability of the selected result without degrading prediction accuracy [ 21 ]. However, these feature selections are with less consideration of the correlation between two feature types. For different feature types, learning a shared subspace for all classes is a popular strategy to reduce the dimensionality. Although subspacebased methods are suitable for high-dimensional data, it learns a linear or non-linear embedding transformation rather than selects relevant and significant features from original feature types. Thus, feature selection is becoming increasingly applied to obtain compact data representation. For example, Wang et al. [ 22 ] and Somol et al. [ 23 ] proposed to select the most discriminative feature types based on the relationships between different feature types, both methods are sparse feature selections rather than filter methods. 2.3 Stability Feature selections can obtain inconsistent results with similar prediction accuracies in the case of data variation. However, a good feature selection method should be robust to data variation. Therefore, it is necessary to develop a stability measure for the results of different feature selections. To evaluate stability, numerous stability measures have been proposed. For example, Somol et al. [ 24 ] proposed a series of stability measurement, such as feature-focused versus subsetfocused measures, selection-registering versus selectionexclusion-registering measures, and subset-size-biased versus subset-size-unbiased measures. At present, a wide variety of stability measures based on physical properties are defined for the comparison of feature subsets, including Hamming distance [ 25 ], Tanimoto distance [ 26 ], Average Tanimoto index [ 27 ], Ochiai coefficient [ 28 ], and other stability measures for subsets with different sizes [ 24 ]. For example, Spearman’s correlation [ 26 ] is used to measure the stability of two weighting vectors, where the top ranked features are set higher weights. Many factors greatly affect the stability of feature selection, such as the number of samples and the criteria and complexity of feature selection. Although stability measures are widely used for evaluating the selected results, it is seldom incorporated into feature selection methods. To improve stability, numerous stable feature selection methods have been developed to deal with different sources of instabilities. These methods can be divided into four categories: (1) ensemble methods [ 29–31 ], (2) sample weighting [ 32 ], (3) feature grouping [ 33 ], and (4) sample injection method [ 34 ]. In general, ensemble feature For predictability, numerous filter models have been developed in feature selection. For example, MinRedundancy and Max-Relevance (mRMR) [ 11 ], as a popular filter model, adopts the following criterion: (1) (2) (3) (4) selection is the most popular topic compared with the others. An ensemble feature selection method consists of two steps: (1) creating a set of component feature selectors and (2) aggregating the results of component feature selectors into an ensemble output. However, ensemble feature selection methods combine the selected results according to prediction accuracy, which may result in imbalance between stability and predictability. By contrast, the proposed SIP-FS adopts stability measure as an additional constrain in selection criterion to balance predictability and stability. To the best of our knowledge, both stability and interpretability are seldom explored simultaneously in existing feature selection methods. 3 Methodology This section presents feature selections and their corresponding results using different criteria based on predictability, stability, and interpretability, as shown in Fig. 1. Suppose a feature set F with m -dimensional features fl is extracted using l different types for each image, denoted by F = f1, f2, . . . fm . If the length of a given feature type G(i) is mi dimensions, denotes by G(i) f1(i), f2(i), . . . , fm(ii) , li=1 mi = m, then G(1), G(2), . . . , G(l) = F can be denoted as FG = = f1(1), f2(1), . . . , fm(1l), . . . f1(2), f2(2), . . . fm(2l), . . . f1(l), f2(l), . . . fm(ll) . As shown in Fig. 1a, G(i) represents the i-th feature type with a specific color (green, yellow, red, etc); moreover, G(i) has its own specific property and dimensionality. e where |F| represents the dimensionality of the feature set, I fi; c represents mutual information between individual feature fi in feature set F , and class c , I fi; fj represents mutual information between two individual features fi and fj in feature set F. From Eqs. (2) and (3), D and R in (1) are computed with the mean value of all feature-class relevance and feature-feature redundancy in the feature set F, respectively. In practice, the selection of the feature set can be achieved by near-optimal incremental search methods: ⎡ 1 fm¯ = arg fim∈Fa−xF ⎣ I fi, c − m − 1 fj∈F I fi, fj ⎦ ⎤ where F represents m- 1-dimensional feature subset that has been already selected from F. Equation (4) aims to selecting the m-th from the candidate feature subset F −F and implements trade-off between high class relevance and low feature redundancy. As shown in Fig. 1b, the features selected from the same feature type are scattering in terms of ranking, which affects the quantitative evaluation of multiple features, resulting in the lack of interpretability. In addition, the selected results may greatly change due to data fluctuation. In addition to predictability, stability is another important measure in feature selection. Various stability evaluation indexes are only used to evaluate feature selection method rather than improve the stability of the method itself [ 24 ]. To the best of our knowledge, stability is seldom considered in feature selection criteria. Therefore, stability constraint is employed in this study to obtain robust selection results: fopt = arg max(D − R + k×S) where S represents existing stability evaluation index. k is a parameter, which balances prediction factor (D − R) and stability factor S. Then, the stability evaluation index can be computed by: (5) (6) (7) ⎡ G¯ (v−1) =arg max 1 G(j)∈FG−FvG−1 ⎢⎢⎣ ρ G(j),c − v − 1 G¯ (i)∈F¯vG−1 ⎤ ρ G(j), G¯ (i) + k×S⎥⎥ ⎦ (8) where ρ represents generalized correlation coefficient between different feature types, G¯ (i) the i-th selected feature type, and G(j) denotes a certain feature type from the candidate feature set, FG − FvG−1. Generalized correlation coefficient is degraded to Pearson’s correlation coefficient when the dimensionality of G¯(i) and G(j) is 1. In the case of only using GCC in Eq. (8) when k = 0, the corresponding feature selection takes predictability and interpretability into account, as shown in Fig. 1d. The selected features of the same feature type are close to each other while the corresponding ranking may greatly change due to data fluctuation. If k = 0 in Eq. (8), it means that the feature selection simultaneously takes predictability, stability, and interpretability into account, which is the socalled SIP-FS method in this paper, as shown in Fig. 1e. From an interpretative point of view, features selected by SIP-FS method are meaningful class-specific features [ 35 ] with the use of one-versus-all strategy. 4 SIP-FS algorithm SIP-FS aims to select a reasonable and compact feature subset for data representation efficiently; thereby, the selected result should be meaningful and insensitive to data fluctuation as well as performing well in prediction accuracy. SIP-FS is implemented by repeated iteration until stable and selects the feature subset obtained (uses the selected/obtained feature subset) at the last iteration as the final result. For the i-th iteration, k = λ1 ∗i and the stability Si is computed by the mean of all stabilities between Fi and Fj (j = 1, 2, . . . i − 1), where Fi and Fj represent the i-th and the j-th selected feature subset, respectively. si = i−1 1 i − 1 j=1 S Fi, Fj |Si − Si−1| → 0 where S Ff , Fj following condition is satisfied: = ||FFff ∪∩FFjj|| . The iteration stops until the Each iteration consists of two parts: (1) selecting feature types, corresponding to steps 3 to 6 as shown in Algorithm 1 and (2) removing the redundancy from the selected feature type, corresponding to steps 7 to 12 as shown in Algorithm 1. In the first part, feature types are selected based on Eq. (8) until other feature types can not provide additional information, as (11). (9) (10) S(f , F) = S Ff , Fj = i−1 1 i − 1 j=1 |Ff ∩ Fj| |Ff ∪ Fj| S Ff , Fj where Ff is the union between the selected features and the optimal feature f to be selected in the current selection, Fj(j = 1, 2, . . . , i − 1) represents the selected feature subset, and |Ff ∩ Fj| and |Ff ∪ Fj| represent the intersection and union between feature sets Ff and Fj , respectively. Unlike Eq. (1), both predictability and stability are used in the the feature selection criterion of Eq. (5). As shown as in Fig. 1c, stability constraint helps obtain consistent results of ranking. Similar to predictability and stability, interpretability is essential for feature selection [ 20 ]. However, mutual information fails to measure the correlation between different types of features, as multivariate density estimation is hard to accurately estimate. Both Eqs. (1) and (5) fail to select interpretive results. Instead of mutual information, generalized correlation coefficient (GCC) is adopted to measure D and R from Eqs. (1) to (5) for preserving predictability. Given v − 1 types of feature F¯vG−1 = G¯ (1) ∩ G¯ (2) ∩ . . . G¯ (v−1) selected from the entire feature set of l types FvG−1 = G(1)∩ G(2) ∩ . . . G(v−1), where G¯ (x) denotes the x th selected feature type (x = 1, 2 . . . , v − 1), selecting the v th type G¯ (V ) from set FG − FvG−1 is based on the following condition: → 0 (11) (12) The first part could obtain the ranking of feature type; however, in each selected feature type, there may exist redundancy. Therefore, in the second part, the redundancy of each feature type is further removed by selecting a subset. Given that m − 1 features are selected from the v-th feature type, the selection of the m-th feature f¯m(v) is described as follows. where Gm(v) = G¯ m(v−)1 ∪ fm(v) = f¯1(v) ∪ f¯2(v) ∪ ...f¯m(v−) 1 ∪ fm(v−) 1, fm(v) denotes a certain feature in the candidate feature set. For the v-th feature type G(v), a subset is obtained until other features can not provide additional information, as in the following equation. 5 Results and discussions In this section, extensive experiments are conducted to illustrate the effectiveness of SIP-FS in terms of predictability, stability, and interpretability. Four feature selection methods, mRMR, ReliefF, En-mRMR, and En-Relief, are used for performance comparisons on three publicly available datasets (two web image datasets named MIML [ 36 ] and NUS-WIDE-LITE [ 37 ], a remote sensing image dataset named USGS21 [ 38 ]). mRMR, ReliefF are commonly used filter methods, while En-mRMR, and En-Relief are two ensemble methods. One versus all strategy is adopted to select classspecific features for SIP-FS as well as other comparison methods. For the three datasets, different types of feature are used followed by normalization individually. Libsvm [ 39 ] is used for training and classification. The images in each dataset are divided into two equal parts, in which one for training and the other for testing. Experiments are randomly repeated 10 times to report the average results. Algorithms Agricultural 5.1 Datasets MIML consists of five classes, which are desert, mountain, sea, sunset, and trees. The number of five classes is 340, 268, 341, 261, and 378, respectively. Figure 2 shows sample images of this dataset. Eight types of feature (a total of 638 dimensions), color histogram, color moments, color coherence, textures, tamura-texture coarseness, tamuratexture directionality, edge orientation histogram, and SBN colors are used in experiments. The dimension of these features is 256, 6, 128, 15, 10, 8, 80, and 135, respectively. NUS-WIDE-LITE contains images from Flickr.com collected by the National University of Singapore. In experiments, the images with zero label or more than one labels are removed, resulting in a single label dataset which contains nine classes: birds, boats, flowers, rocks, sun, tower, toy, tree, and vehicle, as shown in Fig. 3. Five types of feature (a total of 634 dimensions), color histogram, blockwise color moments, color correlogram, edge direction histogram, and wavelet texture are used for experimental evaluation. The dimension of these features is 64, 225, 144, 73, and 128, respectively. USGS21 contains 21 classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium density residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis courts, as shown in Fig. 4. Each class consists of 100 256 × 256-pixels images with the spatial resolution of one foot. Five types of feature (a total of 809 dimensions), color moment, HOG, Gabor, LBP, and Gist, extracted by [ 40 ] are used for evaluation. The dimension of these features is 81, 37, 120, 59, and 512, respectively. 5.2 Effects of λ1 and λ2 on stability In the proposed method, two parameters, λ1 and λ2, have influence on the performance of stability. λ1 determines the k value, which balances predictability and stability, while λ2 determines the proportion of subsample generation in iterative feature selection. Suitable combination of λ1 and λ2 is beneficial for obtaining consistent results. The parameter tuning is conducted for each class individually. Figure 5 shows the influence of λ1 and λ2 on stability for three different classes, where λ1 is in the range of 0.0001, 0.001, 0.01, 0.1, 1, 10, and λ2 is in the range of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively. In general, high stability can be obtained using moderate λ1 value (e.g., 0.001, 0.01, and 0.1 ) and large λ2 (0.8 or 0.9), compared with other parameter combinations. The smaller λ1 value corresponds to better stability, yet the computational complexity is significantly increased. Small Table 3 Stability comparisons on USGS21 mRMR λ2 may result in high fluctuation of subsamples, leading to inconsistent selected results. 5.3 Stability analysis Tables 1, 2, and 3 show the stability comparisons of five methods on the three datasets. The stabilities of each class and the entire dataset (average) are given in these tables. The stability value ranges from 0 to 1, whereas, “0” and “1” represent the ranking of the selected results are completely inconsistent and consistent in randomly repeated feature selections, respectively. For Tables 1, 2, and 3, compared with other methods, SIP-FS significantly achieves stability improvement for each class (except for “dense residential” and “medium residential” shown in Table 3) as well as the entire dataset, indicating that SIP-FS helps select much more stable features. In general, mRMR combined with ensemble strategy does not indicate significant improvement in terms of stability. Though ensemble strategy indicates slightly stability advantage for ReliefF, En-ReliefF performs worse than SIP-FS. Overall, SIP-FS performs best on the three datasets in terms of stability. 5.4 Interpretability analysis Given a certain class, the prediction accuracy varies with feature types. How to select feature types and measure their effectiveness for a specific class are essential for interpretability analysis. In particular, one-versus-all strategy are combined with SIP-FS to select feature types (each contains a certain number of features) for a specific class. The effectiveness of these feature types for each class are measured by the relative contribution Birds Boats ratio, which are normalized by the respective maximum contribution [ 14 ]. Figures 6, 7, and 8 show the selected feature types for each class with the respective relative contribution ratio. For example, the selected feature types for “mountain” in MIML are shape and color features, as shown in Fig. 6. According to the relative contribution ratios, the selected feature types are edge orientation histogram, color coherence, color histogram, and SBN color. The most discriminative feature type is shape and the other three are color features (color coherence, color histogram, and SBN colors). However, some texture features (textures, tamuratexture coarseness, and tamura-texture) and redundant color feature (color moments) are removed. As shown in Fig. 7, color correlogram, edge direction (oriented) histogram, and wavelet texture provide complementary information for describing each class in NUS-WIDELITE dataset. In addition, block-wise color moments provide less information for most of classes in this dataset, 0.735 0.873 0.812 0.868 0.875 0.833 The top performance in each row is highlightened in boldface The top performance in each row is highlightened in boldface while color moments are useless because of the information redundant. In USGS21 dataset, take the big class road (including freeway, overpass, and runway) and water (including bench and river) as two examples, as shown in Fig. 8. LBP is the most discriminative feature type for “road” while color moment is the most discriminative feature type for “water”. Furthermore, as a subclass of water, a river need additional complementary information provided by the other four feature types (LBP, Gabor, HOG and Gist) besides color moment. In general, SIP-FS provides a more interpretive data representation than other comparison methods. In short, the proposed SIP-FS method provides a more interpretable means for data representation than that of the existing feature selections. More useful information Table 5 Predictability comparisons on NUS-WIDE-LITE ReliefF Table 6 Predictability comparisons on USGS21 mRMR ReliefF En-mRMR En-ReliefF SIP-FS From Table 4, the predictability of mRMR four classes (e.g., mountains, sea, sunset, and trees) of MIML performs better than that of other methods. Although SIP-FS performs worse than mRMR in terms of average performance, it shows advantages than the other three methods. From Table 5, mRMR and SIP-FS perform best among all methods in terms of average performance. The comparison of mRMR and SIP-FS indicates that both methods have their own accuracy advantages on some classes. For example, the prediction accuracies of SIP-FS on boats, rocks, sun, and vehicle indicate advantages over that of mRMR. From Table 6, the average predictability performances of mRMR, En-mRMR and SIP-FS indicate significantly advantages over that of the others (ReliefF and En-ReliefF). It is worth noting that although En-ReliefF obtains the highest stability on “dense residential” and “medium residential” (as shown in Table 3), it has the lowest prediction accuracy (as shown in Table 6). In general, SIP-FS and mRMR perform best among all comparison methods on the three datasets, demonstrating that it can maintain good predictability. To further investigate the effect of the number of selected features on predictability performance, Fig. 9 shows the prediction accuracy of five feature selection methods on three different classes. In general, the prediction accuracy of the five methods tends to increase with the number of selected features increases. Desirable prediction results can be obtained by selecting the leading features, such as 20 (trees), 30 (flowers), and 20 (building) features, corresponding to Fig. 9a–c. 5.6 Trade-off between stability and predictability In the section, stability-predictability tradeoff (SPT) is used to provide a formal and automatic way of jointly evaluating the trade-off between stability and predictability, as in ref. [ 29 ]. The definition of SPT is as follows. SPT = 2 × stability × predictability stability + predictability (14) where stability (Tables 1, 2 and 3) and predictability (Tables 4, 5 and 6) denote the average performance. SPT 5.5 Predictability analysis Tables 4, 5 and 6 show the prediction accuracy of each class on the three datasets to evaluate the predictability, respectively. The predictability value ranges from 0 to 1, whereas, “0” and “1” represent completely misclassification and completely correct classification, respectively. ranges from 0 to 1, the higher the SPT, the better the performance. The SPTs for the three datasets are shown in Fig. 10. Several conclusions can be drawn from Fig. 10: (1) Compared with other methods, SIP-FS can obtain better tradeoff between stability and predictability. (2) mRMR and ReliefF combined with ensemble strategy indicates higher SPT than that without ensemble strategy. 6 Conclusions In this study, a novel feature selection method called SIPFS is proposed to explore the stability and interpretability simultaneously while preserving predictability. Given a set of distinct feature types, the relation between different feature types is measured by minimal redundancy maximal relevance based on generalized correlation. Several feature types can then be selected and used to determine what types contribute to a specific class by quantitative evaluation. Furthermore, consistent results of ranking can be achieved through incorporating stability into the criterion of SIP-FS. The experiments on three datasets, MIML, NUS-WIDE-LITE, and USGS21, demonstrate that the performances of stability and interpretability are significantly improved without sacrificing predictability, compared with other filter and their respective ensemblebased methods. In future work, we intend to further investigate the selection of multi-modal information using SIP-FS. Acknowledgements Not applicable. Funding This work was supported by the National Key Basic Research and Development Program of China under Grant 2012CB719903, the Science Fund for Creative Research Groups of the National Natural Science Foundation of China under Grant 61221003, the National Natural Science Foundation of China under Grant 41071256, 41571402, and the National Science Foundation of China Youth Program under Grant 41101386. Availability of data and materials Not applicable. Authors’ contributions YG and JJ have implemented the algorithms and performed most of the experiments. HH, TF and DL modified the manuscript. All authors read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 1. T Ojala , M Pietikainen , D Harwood , in Proceedings of the 12th International Conference on Pattern Recognition . Performance evaluation of texture measures with classification based on kullback discrimination of distributions (IEEE , New York, 2002 ), pp. 582 - 5851 2. TS Lee , Image representation using 2d gabor wavelets . IEEE Trans. Pattern Anal. Mach. Intell . 18 ( 10 ), 959 - 71 ( 1996 ) 3. X Jiang , J Lai , Sparse and dense hybrid representation via dictionary decomposition for face recognition . IEEE Trans. Pattern Anal. Mach. Intell . 37 ( 5 ), 1067 - 79 ( 2015 ) 4. H Liu , L Yu , Toward integrating feature selection algorithms for classification and clustering . IEEE Trans. Knowl. Data Eng . 17 ( 4 ), 491 - 502 ( 2005 ) 5. Z Li , J Liu , Y Yang , X Zhou , H Lu , Clustering-guided sparse structural learning for unsupervised feature selection . IEEE Trans. Knowl. Data Eng . 26 ( 9 ), 2138 - 2150 ( 2014 ) 6. PN Belhumeur , JP Hespanha , DJ Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection . IEEE Trans. Pattern Anal. Mach. Intell . 19 ( 7 ), 711 - 720 ( 2002 ) 7. F Zamani , M Jamzad , A feature fusion based localized multiple kernel learning system for real world image classification . EURASIP J. Image Video Process . 2017(1) , 78 ( 2017 ) 8. F Poorahangaryan , H Ghassemian , A multiscale modified minimum spanning forest method for spatial-spectral hyperspectral images classification . EURASIP J. Image Video Process . 2017(1) , 71 ( 2017 ) 9. X He , P Niyogi , in 17th Annual Conference on Neural Information Processing Systems (NIPS) . Locality preserving projections ( MIT PRESS, Cambridge, 2003 ), pp. 186 - 197 10. Y Wang , C Han , C Hsieh , K Fan , Vehicle color classification using manifold learning methods from urban surveillance videos . EURASIP J. Image Video Process . 2014 , 48 ( 2014 ) 11. H Peng , F Long , CHQ Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy . IEEE Trans. Pattern Anal. Mach. Intell . 27 ( 8 ), 1226 - 38 ( 2005 ) 12. Y Li , J Si , G Zhou , S Huang , S Chen , FREL: A Stable Feature Selection Algorithm . IEEE Trans. Neural Netw. Learn. Syst . 26 ( 7 ), 1388 - 1402 ( 2017 ) 13. T Le , S Kim , On measuring confidence levels using multiple views of feature set for useful unlabeled data selection . Neurocomputing . 173 , 1589 - 601 ( 2016 ) 14. X Chen , T Fang , H Huo , D Li , Measuring the effectiveness of various features for thematic information extraction from very high resolution remote sensing imagery . IEEE Trans. Geosci. Remote Sens . 53 ( 9 ), 4837 - 51 ( 2015 ) 15. Y Sun , Iterative RELIEF for feature weighting: algorithms, theories, and applications . IEEE Trans. Pattern Anal. Mach. Intell . 29 ( 6 ), 1035 - 51 ( 2007 ) 16. CM Bishop, Pattern recognition and machine learning , 5th Edition . Information science and statistics. (Springer, New Haven, 2007 ) 17. H Wei , SA Billings, Feature subset selection and ranking for data dimensionality reduction . IEEE Trans. Pattern Anal. Mach. Intell . 29 ( 1 ), 162 - 66 ( 2007 ) 18. M Dash , H Liu , Consistency-based search in feature selection . Artif. Intell . 151 ( 1-2 ), 155 - 176 ( 2003 ) 19. I Guyon , A Elisseeff , An introduction to variable and feature selection . J. Mach. Learn. Res . 3 , 1157 - 82 ( 2003 ) 20. A-C Haury , P Gestraud , J-P Vert , The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures . PLoS ONE . 6 ( 12 ), e28210 ( 2011 ) 21. J Li , H Liu , S-K Ng , L Wong , Discovery of significant rules for classifying cancer diagnosis data . Bioinformatics (Oxford, England). 19 , 93 - 102 ( 2003 ) 22. W Hu , W Li , X Zhang , SJ Maybank, Single and multiple object tracking using a multi-feature joint sparse representation . IEEE Trans. Pattern Anal. Mach. Intell . 37 ( 4 ), 816 - 33 ( 2015 ) 23. H Wang , F Nie , H Huang, in Proceedings of the 30th International Conference on Machine Learning . Multi-view clustering and feature learning via structured sparsity , vol. 28 (JMLR.org, Atlanta, 2013 ), pp. 1389 - 1397 24. P Somol , J Novovicová, Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality . IEEE Trans. Pattern Anal. Mach. Intell . 32 ( 11 ), 1921 - 39 ( 2010 ) 25. PC K. Dunne , F Azuaje , Solutions to instability problems with sequential wrapper-based approaches to feature selection . TCD-CS-2002-28. J. Mach. Learn. Res , 1 - 22 ( 2002 ) 26. A Kalousis , J Prados , M Hilario , Stability of feature selection algorithms: a study on high-dimensional spaces . Knowl. Inf. Syst . 12 ( 1 ), 95 - 116 ( 2007 ) 27. S Loscalzo , L Yu , CHQ Ding , in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Consensus group stable feature selection (ACM , New York, 2009 ), pp. 567 - 576 28. M Zucknick , S Richardson , EA Stronach, Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods . Stat. Appl. Genet. Mol. Biol . 7 ( 1 ), 95 - 116 ( 2008 ) 29. Y Saeys , T Abeel , YV de Peer, in Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases . Robust feature selection using ensemble feature selection techniques (Springer, Berlin, 2008 ), pp. 313 - 25 30. Y Li , S Gao , S Chen, in Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence . Ensemble feature weighting based on local learning and diversity (AAAI , Menlo Park, 2012 ) 31. A Woznica , P Nguyen , A Kalousis, in The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Model mining for robust feature selection (ACM , New York, 2012 ), pp. 913 - 921 32. Y Han , L Yu, in ICDM 2010, in The 10th IEEE International Conference on Data Mining. A variance reduction framework for stable feature selection (IEEE , Washington, 2010 ), pp. 206 - 215 33. L Yu , Y Han, ME Berens, Stable gene selection from microarray data via sample weighting . IEEE/ACM Trans. Comput. Biology Bioinform . 9 ( 1 ), 262 - 72 ( 2012 ) 34. L Yu , CHQ Ding, S Loscalzo , in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Stable feature selection via dense feature groups (ACM , New York, 2008 ), pp. 803 - 811 35. X Chen , G Zhou , Y Chen , G Shao , Y Gu , Supervised multiview feature selection exploring homogeneity and heterogeneity with l12-norm and automatic view generation . IEEE Trans. Geosci. Remote Sens . 55 ( 4 ), 2074 - 88 ( 2017 ) 36. Z Zhou , M Zhang, in The Twentieth Annual Conference on Neural Information Processing Systems . Multi-instance multi-label learning with application to scene classification ( MIT Press, Cambridge, 2006 ), pp. 1609 - 1616 37. T Chua , J Tang , R Hong , H Li , Z Luo , Y Zheng, in Proceedings of the 8th ACM International Conference on Image and Video Retrieval . NUS-WIDE: a real-world web image database from national university of singapore (ACM , New York, 2009 ) 38. Y Yang , SD Newsam , in Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems . Bag-of-visual-words and spatial extensions for land-use classification (ACM , New York, 2010 ), pp. 270 - 279 39. C Chang , C Lin , LIBSVM: A library for support vector machines . ACM TIST . 2 ( 3 ), 27 - 12727 ( 2011 ) 40. The Feature Extrction . http://jkzhu.github.io/felib.html. Accessed 2014


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1186%2Fs13640-018-0252-3.pdf

Yiyou Guo, Jinsheng Ji, Hong Huo, Tao Fang, Deren Li. SIP-FS: a novel feature selection for data representation, EURASIP Journal on Image and Video Processing, 2018, 14, DOI: 10.1186/s13640-018-0252-3