Skip to main content
Log in

Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Feature selection is a central issue in machine learning and applied mathematics. Filter feature selection algorithms aim to solve the optimization problem of selecting a set of features that maximize the correlation feature-class and minimize the correlation feature-feature. Mrmr (Minimum Redundancy Maximum Relevance) and Cfs (Correlation-based Feature Selection) are one of the most well-known algorithms that can find an approximate solution to this optimization problem. However, as time passes, the availability of data becomes greater, which makes the feature selection process more challenging. In this paper, we propose two new versions of Mrmr and Cfs that output the same feature set as the original algorithms, but are considerably much faster. Our novel algorithms are based on the solution of the duplication and the redundancy problems intrinsic in the original algorithms. We applied our proposals to thirty datasets related to the field of microarray and cancer analysis. Experiments revealed that the proposed algorithms Mrmr+ and Cfs+ are on average fourteen and three times faster than the original algorithms respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273

    Article  MATH  Google Scholar 

  2. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157

    MATH  Google Scholar 

  3. Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp 306–313

  4. Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49

    Article  Google Scholar 

  5. Hall M (2000) Correlation-based Feature Selection for Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML’00. Morgan Kaufmann Publishers Inc., San Francisco, pp 359–366

  6. Yu L, Liu H (2003) Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, vol 2, pp 856–863

  7. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226

    Article  Google Scholar 

  8. Pino Angulo A (2018) Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information 9(1):6

    Article  Google Scholar 

  9. Huosong X, Jian L (2011) The Research of Feature Selection of Text Classification Based on Integrated Learning Algorithm. In: 2011 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp 20–22

  10. Roy D, Murty KSR, Mohan CK Feature selection using Deep Neural Networks. In: 2015 International Joint Conference on Neural Networks (IJCNN) (2015), pp 1–6

  11. Nguyen HT, Petrović S, Franke K (2010) A Comparison of Feature-Selection Methods for Intrusion Detection. In: Kotenko I, Skormin V (eds) Computer Network Security. Springer, Berlin, pp 242–255

  12. Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1

    Article  Google Scholar 

  13. Covões TF, Hruschka ER (2011) Towards improving cluster-based feature selection with a simplified silhouette filter. Inf Sci 181(18):3766

    Article  Google Scholar 

  14. Elyasigomari V, Lee D, Screen H, Shaheed M (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11

    Article  Google Scholar 

  15. Witten I, Frank E, Hall M, Pal C (2016) Data mining: Practical machine learning tools and techniques the morgan kaufmann series in data management systems. Elsevier Science, New York

    Google Scholar 

  16. Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) Openml: Networked science in machine learning. SIGKDD Explor 15(2):49

    Article  Google Scholar 

  17. Ong CS (2011) Towards open machine learning: Mloss.org and mldata.org. In: 2011 IEEE International Workshop on Open-source Software for Scientific Computation, pp 12–12

  18. Guyon I, Gunn S, Hur AB, Dror G (2004) Result Analysis of the NIPS 2003 Feature Selection Challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04. MIT Press, Cambridge, pp 545–552

  19. Wojnarski M, Stawicki S, Wojnarowski P (2010) TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments. In: Rough Sets and Current Trends in Computing (RSCTC), Lecture Notes in Artificial Intelligence (LNAI). Springer, Lecture Notes in Artificial Intelligence (LNAI), vol 6086, pp 20–29

  20. Wojnarski M (2010) RSCTC’2010 Discovery, Challenge. In: Mining DNA microarray data for medical diagnosis and treatment. In: Rough Sets and Current Trends in Computing. springer, Berlin, pp 4-19

  21. Hruschka ER, de Castro LN, Campello RJGB (2004) Evolutionary algorithms for clustering gene-expression data. In: 2004. ICDM ’04. Fourth IEEE International Conference on Data Mining, pp 403–406

  22. Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144

    Article  Google Scholar 

  23. Alshamlan HM, Badr GH, AlOhali Y (2015) MRMR-ABC A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. In: Biomed research international

  24. El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inf Syst 26(3):487

    Article  Google Scholar 

  25. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. Springer, Berlin, pp 117–136

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Number 17H00762) from the Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian Pino Angulo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Angulo, A.P., Shin, K. Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data. Appl Intell 49, 1954–1967 (2019). https://doi.org/10.1007/s10489-018-1381-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1381-1

Keywords

Navigation