Skip to main content
Log in

On the scalability of feature selection methods on high-dimensional data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Lately, derived from the explosion of high dimensionality, researchers in machine learning became interested not only in accuracy, but also in scalability. Although scalability of learning methods is a trending issue, scalability of feature selection methods has not received the same amount of attention. This research analyzes the scalability of state-of-the-art feature selection methods, belonging to filter, embedded and wrapper approaches. For this purpose, several new measures are presented, based not only on accuracy but also on execution time and stability. The results on seven classical artificial datasets are presented and discussed, as well as two cases study analyzing the particularities of microarray data and the effect of redundancy. Trying to check whether the results can be generalized, we included some experiments with two real datasets. As expected, filters are the most scalable feature selection approach, being INTERACT, ReliefF and mRMR the most accurate methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. http://www.lidiagroup.org/index.php/en/materials-en.html.

  2. Colon Cancer dataset is available on http://datam.i2r.a-star.edu.sg/datasets/krbd.

  3. KDD Cup 99 dataset is available on http://kdd.ics.uci.edu/kddcup99/kddcup99.html.

  4. http://www.lidiagroup.org/index.php/en/materials-en.html.

References

  1. Ahmed A, Xing EP (2013) Scalable dynamic nonparametric Bayesian models of contents and users. In: International joint conference on artificial intelligence, IJCAI, pp 3111–3116

  2. Alonso-Betanzos A, Bolón-Canedo V, Fernández-Francos D, Porto-Díaz I, Sánchez-Maroño N (2013) Efficiency and scalability methods for computational intellect, chapter up-to-date feature selection methods for scalable and efficient machine learning, IGI Global, pp 1–26

  3. Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A (2010) On the effectiveness of discretization on gene selection of microarray data. In: The 2010 international joint conference on neural networks (IJCNN), IEEE, pp 3167–3174

  4. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2011) Feature selection and classification in multiple class datasets: an application to kdd cup 99 dataset. Expert Syst Appl 38(5):5947–5957

    Article  Google Scholar 

  5. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34(3):483–519

    Article  Google Scholar 

  6. Bottou L, Bousquet O (2011) The tradeoffs of large-scale learning. In: Optimization for machine learning, pp 351–368

  7. Breinman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks-Cole Advanced Books and Software, Pacific Grove

    Google Scholar 

  8. Brown G, Pocock A, Zhao M-J, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66

    MathSciNet  MATH  Google Scholar 

  9. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176

    Article  MathSciNet  MATH  Google Scholar 

  10. Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3

    Article  Google Scholar 

  11. Fahad A, Tari Z, Khalil I, Habib I, Alnuweiri H (2013) Toward an efficient and scalable feature selection approach for internet traffic classification. Comput Netw 57:2040–2057

    Article  Google Scholar 

  12. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  13. Gulgezen G, Cataltepe Z, Yu L (2009) Stable and accurate feature selection. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 455–468

  14. Guyon I (2006) Feature extraction: foundations and applications, vol 207. Springer, Berlin

    Google Scholar 

  15. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  16. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  MATH  Google Scholar 

  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  18. Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: McDonald C (ed) Computer science ’98 proceedings of the 21st Australasian computer science conference ACSC’98, Perth, 4–6 February, 1998. Springer, Berlin, pp 181–191

  19. Hoi SC, Wang J, Zhao P, Jin R (2012) Online feature selection for mining big data. In: Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications, ACM, pp 93–100

  20. Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14(1):55–63

    Article  Google Scholar 

  21. John GH, Kohavi R, Pfleger K et al (1994) Irrelevant features and the subset selection problem. ICML 94:121–129

    Google Scholar 

  22. Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    Article  MATH  Google Scholar 

  23. Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, pp 249–256. Morgan Kaufmann Publishers Inc

  24. Koller D, Sahami M (1995) Toward optimal feature selection. In: 13th international conference on machine learning, pp 284–292

  25. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Machine learning: ECML-94, Springer, pp 171–182

  26. Kumar R, Vassilvitskii S (2010) Generalized distances between rankings. In: Proceedings of the 19th international conference on world wide web, ACM, pp 571–580

  27. Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of seventh international conference on tools with artificial intelligence, 1995, IEEE, pp 388–391

  28. Liu H, Setiono R (1996) A probabilistic approach to feature selection—a filter solution. In: Proceedings of the 13th international conference on machine learning, pp 319–327. Morgan Kaufmann

  29. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  Google Scholar 

  30. Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) Sor: Scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: SIAM data mining conference, pp 576–587

  31. Mejía-Lavalle M, Sucar E, Arroyo G (2006) Feature selection with a perceptron neural net. In: Proceedings of the international workshop on feature selection for data mining, pp 131–135

  32. Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, Princeton University

  33. Nogueira S, Brown G (2016) Measuring the stability of feature selection. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 442–457

  34. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  35. Peteiro-Barral D, Bolon-Canedo V, Alonso-Betanzos A, Guijarro-Berdinas B, Sanchez-Marono N (2012) Scalability analysis of filter-based methods for feature selection. Adv Smart Syst Res 2(1):21–26

    Google Scholar 

  36. Peteiro-Barral D, Bolón-Canedo V, Alonso-Betanzos A, Guijarro-Berdiñas B, Sánchez-Maroño N (2012) Toward the scalability of neural networks through feature selection. Expert Syst Appl 4(8):2807–2816

    Article  Google Scholar 

  37. Peteiro-Barral D, Guijarro-Berdiñas B (2013) A study on the scalability of artificial neural networks training algorithms using multiple-criteria decision-making methods. In: Artificial intelligence and soft computing, volume 7894 of lecture notes in computer science, Springer, pp 162–173

  38. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  39. Raykar VC, Duraiswami R, Zhao LH (2010) Fast computation of kernel estimators. J Comput Graph Stat 19(1):205–220

    Article  MathSciNet  Google Scholar 

  40. Rokach L, Schclar A, Itach E (2013) Ensemble methods for multi-label classification. arXiv preprint  arXiv:1307.1769

  41. Sonnenburg S, Franc V, Yom-Tov E, Sebag M (2008) Pascal large scale learning challenge. In: 25th international conference on machine learning (ICML2008) workshop. Journal of Machine Learning Research, vol 10, pp 1937–1953

  42. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

    Article  Google Scholar 

  43. Sun Y, Todorovic S, Goodison S (2008) A feature selection algorithm capable of handling extremely large data dimensionality. In: Proceedings of the 2008 SIAM international conference in data mining, pp 530–540

  44. Thrun SB, Bala J, Bloedorn E, Bratko I, Cestnik B, Cheng J, De Jong K, Dzeroski S, Fahlman SE, Fisher D, et al(1991) The monk’s problems a performance comparison of different learning algorithms

  45. Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Data mining and knowledge discovery handbook, Springer, pp 667–685

  46. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  47. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  48. Yui M, Kojima I (2013) A database-hadoop hybrid approach to scalable machine learning. In: IEEE international congress on Big Data 2013. IEEE, pp 1–8

  49. Zhang M-L, Peña JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229

    Article  MATH  Google Scholar 

  50. Zhao Z, Liu H (2007) Searching for interacting features. IJCAI 7:1156–1161

    Google Scholar 

  51. Zhao Z, Zhang R, Cox J, Duling D, Sarle W (2013) Massively parallel feature selection: an approach based on variance preservation. Mach Learn 92:195–220

    Article  MathSciNet  MATH  Google Scholar 

  52. Zhao ZA, Liu H (2011) Spectral feature selection for data mining. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  53. Zhu Z, Ong Y-S, Zurada JM (2010) Identification of full and partial class relevant genes. IEEE/ACM Trans Comput Biol Bioinf 7(2):263–277

    Article  Google Scholar 

Download references

Acknowledgements

This research has been economically supported in part by the Ministerio de Economía y Competitividad of the Spanish Government (Research Project TIN2015-65069-C2-1-R), by European Union FEDER funds and by the Consellería de Industria of the Xunta de Galicia (research project GRC2014/035). Financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016–2019) and the European Union (European Regional Development Fund-ERDF) is gratefully acknowledged (Research Project ED431G/01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Bolón-Canedo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bolón-Canedo, V., Rego-Fernández, D., Peteiro-Barral, D. et al. On the scalability of feature selection methods on high-dimensional data. Knowl Inf Syst 56, 395–442 (2018). https://doi.org/10.1007/s10115-017-1140-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1140-3

Keywords

Navigation