Selection of Significant Features Using Monte Carlo Feature Selection

Bornelöv, Susanne; Komorowski, Jan

doi:10.1007/978-3-319-18781-5_2

Selection of Significant Features Using Monte Carlo Feature Selection

Susanne Bornelöv^4,5 &
Jan Komorowski^4,6

Chapter
First Online: 01 January 2015

1955 Accesses
2 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 605))

Abstract

Feature selection methods identify subsets of features in large datasets. Such methods have become popular in data-intensive areas, and performing feature selection prior to model construction may reduce the computational cost and improve the model quality. Monte Carlo Feature Selection (MCFS) is a feature selection method aimed at finding features to use for classification. Here we suggest a strategy using a z-test to compute the significance of a feature using MCFS. We have used simulated data with both informative and random features, and compared the z-test with a permutation test and a test implemented into the MCFS software. The z-test had a higher agreement with the permutation test compared with the built-in test. Furthermore, it avoided a bias related to the distribution of feature values that may have affected the built-in test. In conclusion, the suggested method has the potential to improve feature selection using MCFS.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J. Mach. Learn. Res. 3:1157–1182
MATH Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
Article Google Scholar
Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J (2008) Monte Carlo feature selection for supervised classification. Bioinformatics 24:110–117
Article Google Scholar
Kierczak M, Ginalski K, Draminski M, Koronacki J, Rudnicki W, Komorowski J (2009) A rough set-based model of HIV-1 reverse transcriptase resistome. Bioinform. Biol. Insights 3:109–127
Google Scholar
Draminski M, Kierczak M, Koronacki J, Komorowski J (2010) Monte Carlo feature selection and interdependency discovery in supervised classification. Stud Comput Intell 263:371–385
Article Google Scholar
Enroth S, Bornelöv S, Wadelius C, Komorowski J (2012) Combinations of histone modifications mark exon inclusion levels. PLoS ONE 7:e29911
Article MATH Google Scholar
Bornelöv S, Sääf A, Melen E, Bergström A, Moghadam BT, Pulkkinen V, Acevedo N, Pietras CO, Ege M, Braun-Fahrlander C, Riedler J, Doekes G, Kabesch M, van Hage M, Kere J, Scheynius A, Söderhäll C, Pershagen G, Komorowski J (2013) Rule-based models of the interplay between genetic and environmental factors in Childhood Allergy. PLoS ONE 8(11):e80080
Google Scholar
Kruczyk M, Zetterberg H, Hansson O, Rolstad S, Minthon L, Wallin A, Blennow K, Komorowski J, Andersson M (2012) Monte Carlo feature selection and rule-based models to predict Alzheimer’s disease in mild cognitive impairment. J Neural Transm 119:821–831
Article Google Scholar
http://www.ipipan.eu/staff/m.draminski/files/dmLab185.zip
Van AHT, Saeys Y, Wehenkel L, Geurts P (2012) Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28:1766–1774
Article Google Scholar
Dramiński M, Kierczak M, Nowak-Brzezińska A, Koronacki J, Komorowski J (2011) The Monte Carlo feature selection and interdependency discovery is unbiased, vol 40, pp 199–211. Systems Research Institute, Polish Academy of Sciences
Google Scholar

Download references

Acknowledgments

We wish to thank the reviewers for insightful comments that helped improve this paper. The authors were in part supported by an ESSENCE grant, by Uppsala University and by the Institute of Computer Science, Polish Academy of Sciences.

Author information

Authors and Affiliations

Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Susanne Bornelöv & Jan Komorowski
Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
Susanne Bornelöv
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Jan Komorowski

Authors

Susanne Bornelöv
View author publications
You can also search for this author in PubMed Google Scholar
Jan Komorowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Komorowski .

Editor information

Editors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Stan Matwin
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, and Warsaw University of Technology, Warsaw, Poland
Jan Mielniczuk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bornelöv, S., Komorowski, J. (2016). Selection of Significant Features Using Monte Carlo Feature Selection. In: Matwin, S., Mielniczuk, J. (eds) Challenges in Computational Statistics and Data Mining. Studies in Computational Intelligence, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-319-18781-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-18781-5_2
Published: 28 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18780-8
Online ISBN: 978-3-319-18781-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics