Skip to main content

Multidimensional Feature Selection and Interaction Mining with Decision Tree Based Ensemble Methods

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 616))

Abstract

This paper demonstrates capability of detecting strong synthetic benchmark feature interactions in a set of mixed categorical and continuous variables using a modified version of Monte Carlo Feature Selection algorithm. MCFS’s original way of detecting feature interactions relying on the analysis of structure of trained decision trees is compared with our modified approach consisting of a series of variable permutations combined with a decomposition of feature total effect to main effect and interaction effects. A comparison with unmodified MCFS, which by default handles only classification problems using C4.5 decision trees, shows that the new approach is slightly more robust. Furthermore, the decomposition approach is flexible by allowing to plug in different types of models to MCFS. This opens a way to handle high-throughput supervised feature selection and interaction mining problems for classification, regression and censored survival decision vector.

The original version of this chapter was revised: Misspelt author name has been corrected. The erratum to this chapter is available at 10.1007/978-3-319-60816-7_40

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-60816-7_40

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang, W., et al.: Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol. 16, 133 (2015)

    Google Scholar 

  2. The 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

    Google Scholar 

  3. Sidak, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)

    MathSciNet  MATH  Google Scholar 

  4. Storey, J.: A direct approach to false discovery rates. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 64, 499–518 (2002)

    Google Scholar 

  5. Perneger, T.: Whats wrong with Bonferroni adjustments. BMJ 316, 1236–1238 (1998)

    Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45, 157–176 (2001)

    Google Scholar 

  7. Winham, S., et al.: SNP interaction detection with random forests in high-dimensional genetic data. BMC Bioinform. 13, 164 (2012)

    Google Scholar 

  8. Bureau, A., et al.: Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. 28, 171–182 (2005)

    Google Scholar 

  9. Draminski, M., et al.: Monte carlo feature selection for supervised classification. Bioinform. 24, 110–117 (2008)

    Google Scholar 

  10. Draminski, M., et al.: Monte carlo feature selection and interdependency discovery in supervised classification. Adv. Mach. Learn. II (2010)

    Google Scholar 

  11. Draminski, M., et al.: Discovering networks of interdependent features in high-dimensional problems. Big Data Analysis: New Algorithms for a New Society (2016)

    Google Scholar 

  12. Krol, L.: Distributed monte carlo feature selection: extracting informative features out of multidimensional problems with linear speedup. Beyond Databases, Architectures Struct. 12 (2016)

    Google Scholar 

  13. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 161–182 (2006)

    Google Scholar 

Download references

Acknowledgements

We would like to thank prof. Jacek Koronacki (Polish Academy of Sciences) as well as Anonymous Reviewers for helping to increase quality of the paper.

The work was financially supported by internal grant BK/213/Rau1/2016/10. Calculations were carried out using the computer cluster Ziemowit (http://www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre in the Silesian University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz Krol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Krol, L., Polanska, J. (2017). Multidimensional Feature Selection and Interaction Mining with Decision Tree Based Ensemble Methods. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60816-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60815-0

  • Online ISBN: 978-3-319-60816-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics