Abstract
Feature selection is a central issue in machine learning and applied mathematics. Filter feature selection algorithms aim to solve the optimization problem of selecting a set of features that maximize the correlation feature-class and minimize the correlation feature-feature. Mrmr (Minimum Redundancy Maximum Relevance) and Cfs (Correlation-based Feature Selection) are one of the most well-known algorithms that can find an approximate solution to this optimization problem. However, as time passes, the availability of data becomes greater, which makes the feature selection process more challenging. In this paper, we propose two new versions of Mrmr and Cfs that output the same feature set as the original algorithms, but are considerably much faster. Our novel algorithms are based on the solution of the duplication and the redundancy problems intrinsic in the original algorithms. We applied our proposals to thirty datasets related to the field of microarray and cancer analysis. Experiments revealed that the proposed algorithms Mrmr+ and Cfs+ are on average fourteen and three times faster than the original algorithms respectively.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157
Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp 306–313
Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49
Hall M (2000) Correlation-based Feature Selection for Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 359–366
Yu L, Liu H (2003) Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, vol 2, pp 856–863
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226
Pino Angulo A (2018) Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information 9(1):6
Huosong X, Jian L (2011) The Research of Feature Selection of Text Classification Based on Integrated Learning Algorithm. In: 2011 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 20–22
Roy D, Murty KSR, Mohan CK Feature selection using Deep Neural Networks. In: 2015 International Joint Conference on Neural Networks (IJCNN) (2015), pp 1–6
Nguyen HT, Petrović S, Franke K (2010) A Comparison of Feature-Selection Methods for Intrusion Detection. In: Kotenko I, Skormin V (eds) Computer Network Security. Springer, Berlin, pp 242–255
Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1
Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766
Elyasigomari V, Lee D, Screen H, Shaheed M (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11
Witten I, Frank E, Hall M, Pal C (2016) Data mining: Practical machine learning tools and techniques the morgan kaufmann series in data management systems. Elsevier Science, New York
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) Openml: Networked science in machine learning. SIGKDD Explor 15(2):49
Ong CS (2011) Towards open machine learning: Mloss.org and mldata.org. In: 2011 IEEE International Workshop on Open-source Software for Scientific Computation, pp 12–12
Guyon I, Gunn S, Hur AB, Dror G (2004) Result Analysis of the NIPS 2003 Feature Selection Challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04. MIT Press, Cambridge, pp 545–552
Wojnarski M, Stawicki S, Wojnarowski P (2010) TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments. In: Rough Sets and Current Trends in Computing (RSCTC), Lecture Notes in Artificial Intelligence (LNAI). Springer, Lecture Notes in Artificial Intelligence (LNAI), vol 6086, pp 20–29
Wojnarski M (2010) RSCTC’2010 Discovery, Challenge. In: Mining DNA microarray data for medical diagnosis and treatment. In: Rough Sets and Current Trends in Computing. springer, Berlin, pp 4-19
Hruschka ER, de Castro LN, Campello RJGB (2004) Evolutionary algorithms for clustering gene-expression data. In: 2004. ICDM ’04. Fourth IEEE International Conference on Data Mining, pp 403–406
Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144
Alshamlan HM, Badr GH, AlOhali Y (2015) MRMR-ABC A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. In: Biomed research international
El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inf Syst 26(3):487
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. Springer, Berlin, pp 117–136
Acknowledgments
This work was partially supported by the Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Number 17H00762) from the Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Angulo, A.P., Shin, K. Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data. Appl Intell 49, 1954–1967 (2019). https://doi.org/10.1007/s10489-018-1381-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1381-1