Skip to main content

Stability in Biomarker Discovery: Does Ensemble Feature Selection Really Help?

  • Conference paper
  • First Online:
Current Approaches in Applied Artificial Intelligence (IEA/AIE 2015)

Abstract

Ensemble feature selection has been recently explored as a promising paradigm to improve the stability, i.e. the robustness with respect to sample variation, of subsets of informative features extracted from high-dimensional domains including genetics and medicine. Though recent literature discusses a number of cases where ensemble approaches seem to be capable of providing more stable results, especially in the context of biomarker discovery, there is a lack of systematic studies aiming at providing insight on when, and to which extent, the use of an ensemble method is to be preferred to a simple one. Using a well-known benchmark from the genomics domain, this paper presents an empirical study which evaluates ten selection methods, representatives of different selection approaches, investigating if they get significantly more stable when used in an ensemble fashion. Results of our study provide interesting indications on benefits and limitations of the ensemble paradigm in terms of stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  3. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowledge and Information Systems 34(3), 483–519 (2013)

    Article  Google Scholar 

  4. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and Information Systems 12(1), 95–116 (2007)

    Article  Google Scholar 

  5. Dessì, N., Pascariello, E., Pes, B.: A Comparative Analysis of Biomarker Selection Techniques, BioMed Research International 2013, Article ID 387673, p. 10 (2013)

    Google Scholar 

  6. Awada, W., Khoshgoftaar, T.M., Dittman, D., Wald, R., Napolitano, A.: A review of the stability of feature selection techniques for bioinformatics data. In: IEEE 13th International Conference on Information Reuse and Integration, pp. 356–363. IEEE (2012)

    Google Scholar 

  7. Zengyou, H., Weichuan, Y.: Stable feature selection for biomarker discovery. Computational Biology and Chemistry 34, 215–225 (2010)

    Article  Google Scholar 

  8. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Wald, R., Khoshgoftaar, T.M., Dittman, D., Awada, W., Napolitano, A.: An extensive comparison of feature ranking aggregation techniques in bioinformatics. In: IEEE 13th International Conference on Information Reuse and Integration, pp. 377–384. IEEE (2012)

    Google Scholar 

  11. Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)

    Article  Google Scholar 

  12. Yang, F., Mao, K.Z.: Robust Feature Selection for Microarray Data Based on Multicriterion Fusion. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(4), 1080–1092 (2011)

    Article  Google Scholar 

  13. Haury, A.C., Gestraud, P., Vert, J.P.: The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures. PLOS ONE 6(12), e28210 (2011)

    Article  Google Scholar 

  14. Kuncheva, L.I.: A stability index for feature selection. In: 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 390–395. ACTA Press, Anaheim (2007)

    Google Scholar 

  15. Wald, R., Khoshgoftaar, T.M., Dittman, D.: Mean aggregation versus robust rank aggregation for ensemble gene selection. In: 11th International Conference on Machine Learning and Applications, pp. 63–69 (2012)

    Google Scholar 

  16. Alon, U., Barkai, N., Notterman, D.A., Gish, K., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. PNAS 96, 6745–6750 (1999)

    Article  Google Scholar 

  17. Dessì, N., Pes, B.: Similarity of feature selection methods: An empirical study across data intensive classification tasks. Expert Systems with Applications 42(10), 4632–4642 (2015)

    Article  Google Scholar 

  18. Bouckaert, R.R., Frank, E., Hall, M.A., Holmes, G., et al.: WEKA - Experiences with a Java Open-Source Project. Journal of Machine Learning Research 11, 2533–2541 (2010)

    MATH  Google Scholar 

  19. Liu, H. Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338–391 (1995)

    Google Scholar 

  20. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  21. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Third Edition. Morgan Kaufmann Publishers (2011)

    Google Scholar 

  22. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  23. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  24. Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Machine Learning 53(1–2), 23–69 (2003)

    Article  MATH  Google Scholar 

  25. Rakotomamonjy, A.: Variable selection using SVM based criteria. Journal of Machine Learning Research 3, 1357–1370 (2003)

    MATH  MathSciNet  Google Scholar 

  26. Yang, P., Zhou, B.B., Yang, J.Y., Zomaya, A.Y.: Stability of feature selection algorithms and ensemble feature selection methods in bioinformatics. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data. John Wiley & Sons (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Pes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Dessì, N., Pes, B. (2015). Stability in Biomarker Discovery: Does Ensemble Feature Selection Really Help?. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19066-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19065-5

  • Online ISBN: 978-3-319-19066-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics