Abstract
Many machine learning methods can produce variable importance scores expressing the usability of each feature in context of the produced model; those scores on their own are yet not sufficient to generate feature selection, especially when an all relevant selection is required. There are wrapper methods aiming to solve this problem, mostly focused around estimating the expected distribution of irrelevant feature importance. However, such estimation often requires a substantial computational effort.
In this paper I propose a method of incorporating such estimation within the training process of a random ferns classifier and evaluate it as an all relevant feature selector, both directly and as a part of a dedicated wrapper approach. The obtained results prove its effectiveness and computational efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Friedlander, M., Dobra, A., Massam, H., Briollais, L.: genMOSS: Functions for the Bayesian Analysis of GWAS Data, rpackageversion 1.2 (2014). https://CRAN.R-project.org/package=genMOSS
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2005)
Huynh-Thu, V.A., Wehenkel, L., Geurts, P.: Exploiting tree-based variable importances to selectively identify relevant variables. In: JMLR: Workshop and Conference Proceedings, pp. 60–73 (2008)
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta – a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36(11), 1–13 (2010)
Kursa, M.B.: rFerns: an implementation of the random ferns method for general-purpose machine learning. J. Stat. Softw. 61(10), 1–13 (2014)
Kursa, M.B.: Robustness of random forest-based gene selection methods. BMC Bioinform. 15(1), 8 (2014)
Nilsson, R., Peña, J., Björkegren, J., Tegnér, J.: Consistent feature selection for pattern recognition in polynomial time. J. Mach. Learn. Res. 8, 612 (2007)
Oshin, O., Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using randomised ferns. In: 2009 IEEE 12th International Conference Computer Vision Workshops (ICCV Workshops), pp. 530–537. IEEE (2009)
Özuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. Image Process. (2008)
Özuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Peng, B., Amos, C.I.: Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinform. 11(1), 1–12 (2010)
Saeys, Y., Inza, I.N., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 2181–2186. IEEE (2006)
Acknowledgements
This work has been financed by the National Science Centre, grant 2011/01/N/ST6/07035, as well as with the support of the OCEAN—Open Centre for Data and Data Analysis Project, co-financed by the European Regional Development Fund under the Innovative Economy Operational Programme. Computations were performed at ICM, grant G48-6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kursa, M.B. (2017). Efficient All Relevant Feature Selection with Random Ferns. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-60438-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)