Abstract
This paper demonstrates capability of detecting strong synthetic benchmark feature interactions in a set of mixed categorical and continuous variables using a modified version of Monte Carlo Feature Selection algorithm. MCFS’s original way of detecting feature interactions relying on the analysis of structure of trained decision trees is compared with our modified approach consisting of a series of variable permutations combined with a decomposition of feature total effect to main effect and interaction effects. A comparison with unmodified MCFS, which by default handles only classification problems using C4.5 decision trees, shows that the new approach is slightly more robust. Furthermore, the decomposition approach is flexible by allowing to plug in different types of models to MCFS. This opens a way to handle high-throughput supervised feature selection and interaction mining problems for classification, regression and censored survival decision vector.
The original version of this chapter was revised: Misspelt author name has been corrected. The erratum to this chapter is available at 10.1007/978-3-319-60816-7_40
An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-60816-7_40
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhang, W., et al.: Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol. 16, 133 (2015)
The 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Sidak, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)
Storey, J.: A direct approach to false discovery rates. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 64, 499–518 (2002)
Perneger, T.: Whats wrong with Bonferroni adjustments. BMJ 316, 1236–1238 (1998)
Breiman, L.: Random forests. Mach. Learn. 45, 157–176 (2001)
Winham, S., et al.: SNP interaction detection with random forests in high-dimensional genetic data. BMC Bioinform. 13, 164 (2012)
Bureau, A., et al.: Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. 28, 171–182 (2005)
Draminski, M., et al.: Monte carlo feature selection for supervised classification. Bioinform. 24, 110–117 (2008)
Draminski, M., et al.: Monte carlo feature selection and interdependency discovery in supervised classification. Adv. Mach. Learn. II (2010)
Draminski, M., et al.: Discovering networks of interdependent features in high-dimensional problems. Big Data Analysis: New Algorithms for a New Society (2016)
Krol, L.: Distributed monte carlo feature selection: extracting informative features out of multidimensional problems with linear speedup. Beyond Databases, Architectures Struct. 12 (2016)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 161–182 (2006)
Acknowledgements
We would like to thank prof. Jacek Koronacki (Polish Academy of Sciences) as well as Anonymous Reviewers for helping to increase quality of the paper.
The work was financially supported by internal grant BK/213/Rau1/2016/10. Calculations were carried out using the computer cluster Ziemowit (http://www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre in the Silesian University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Krol, L., Polanska, J. (2017). Multidimensional Feature Selection and Interaction Mining with Decision Tree Based Ensemble Methods. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-60816-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60815-0
Online ISBN: 978-3-319-60816-7
eBook Packages: EngineeringEngineering (R0)