Abstract
Feature selection is frequently used as a preprocessing step to data mining and is attracting growing attention due to the increasing amounts of data emerging from different domains. The large data dimensionality increases the noise and thus the error of learning algorithms. Filter methods for feature selection are specially very fast and useful for high-dimensional datasets. Existing methods focus on producing feature subsets that improve predictive performance, but they often suffer from instability. Instance-based filters, for example, are considered as one of the most effective methods that rank features based on instances neighborhood. However, as the feature weight fluctuates with the instances, small changes in training data result in a different selected subset of features. By another hand, some other filters generate stable results but lead to a modest predictive performance. The absence of a trade-off between stability and classification accuracy decreases the reliability of the feature selection results. In order to deal with this issue, we propose filter methods that improve stability of feature selection while preserving an optimal predictive accuracy and without increasing the complexity of the feature selection algorithms. The proposed approaches first use the strength of instance learning to identify initial sets of relevant features, and the advantage of aggregation techniques to increase the stability of the final set in a second stage. Two classification algorithms are used to evaluate the predictive performance of our proposed instance-based filters compared to state-of-the-art algorithms. The obtained results show the efficiency of our methods in improving both classification accuracy and feature selection stability for high-dimensional datasets.
Similar content being viewed by others
References
Guyon I, Elisseff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. pp 539–548
Korda N, Szorenyi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of the 33rd international conference on international conference on machine learning. pp 1301–1309
Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the 24th annual ACM-SIAM symposium on discrete algorithms
Hu X, Zhou P, Li P et al (2018) A survey on online feature selection with streaming features. Front Comput Sci 12:479–493
Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004
Ben Brahim A, Limam M (2018) Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv Data Anal Classif 12(4):937–952
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell (TPAMI) 19:153–158
Kuncheva LI, Rodríguez JJ (2018) On feature selection protocols for very low-sample-size data. Pattern Recognit 81:660–673
Vabalas A, Gowen E, Poliakoff E, Casson AJ (2019) Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11):e0224365
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116
Nogueira S, Sechidis K, Brown G (2018) On the stability of feature selection algorithms. J Mach Learn Res 18(174):1–54
He Z, Yu W (2010) Review article: stable feature selection for biomarker discovery. Comput Biol Chem 34(4):215–225
Bommert A, Sun X, Bischl B, Rahnenführer J, Langa M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Ben Brahim A, Limam M (2016) A hybrid feature selection method based on instance learning and cooperative subset search. Pattern Recognit Lett 69(C):28–34
Urbanowicz RJ, Meeker M, LaCava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203
Hu Q, Pan W, Song Y, Yu D (2012) Large-margin feature selection for monotonic classification. Knowl Based Syst 31:8–18
Yu Q, Jiang S, Wang R et al (2017) A feature selection approach based on a similarity measure for software defect prediction. Front Inf Technol Electron Eng 18:1744–1753
Kira K, Rendell L (1992) A practical approach to feature selection. In: Sleeman D, Edwards P (eds) International conference on machine learning. pp 368–377
Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32:1610–1626
Robnik SM, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69
Ben Brahim A, Limam M (2014) New prior knowledge based extensions for stable feature selection. In: Proceedings of the 6th international conference of soft computing and pattern recognition. IEEE, pp 306-311
Ben Brahim A, Kalousis A (2017) Semi supervised relevance learning for feature selection on high dimensional data. In: Proceedings of the 14th international conference on computer systems and applications. IEEE, pp 579–584
Loscalzo S, Yu L, Ding CHQ (2009) Consensus group stable feature selection. In: KDD. ACM, pp 567–576
Jerbi W, Ben Brahim A, Essoussi N (2016) A hybrid embedded-filter method for improving feature selection stability of random forests. In: Proceedings of the 16th international conference on hybrid intelligent systems. Springer, pp 370–379
Zhou Q, Ding J, Ning Y, Luo L, Li T (2014) Stable feature selection with ensembles of multi-reliefF. In: Proceedings of the 10th international conference on natural computation. IEEE, pp 742–747
Moon M, Nakai K (2016) Stable feature selection based on the ensemble L1-norm support vector machine for biomarker discovery. BMC Genom 17:1026
Han Y, Yu L (2010) A variance reduction framework for stable feature selection. In: Proceedings of the international conference on data mining. pp 206–215
Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26:392–398
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, LastA KW, Norton TA, Lister J Mesirov, Neuberg DS (2000) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 9:68–74
Dyrskjot L, Thykjaer T, Kruhoffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft TF (2003) Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33:90–96
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T Jr, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Troyanskaya OG, Cantor M, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Vant Veer LJ (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
Pomeroy SL (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442
Gordon G (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
kuncheva L (2007) A stability index for feature selection. In: Proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications. pp 390–395
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ben Brahim, A. Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion. Neural Comput & Applic 33, 1221–1232 (2021). https://doi.org/10.1007/s00521-020-04971-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04971-y