Skip to main content

A Deceiving Charm of Feature Selection: The Microarray Case Study

  • Conference paper
Man-Machine Interactions 2

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 103))

Abstract

Microarray analysis has become a significant use of machine learning in molecular biology. Datasets obtained from this method consist of tens of thousands of attributes usually describing tens of objects. Such setting makes the use of some form of feature selection an inevitable step of analysis—mostly to reduce the feature set to manageable size, but also to obtain an biological insight in the mechanisms of the investigated process. In this paper we present a reanalysis of a previously published late radiation toxicity prediction problem. On that lurid example we show how futile it may be to rely on non-validated feature selection and how even advanced algorithms fail to distinguish between noise and signal when the latter is weak. We also propose methods of detecting and dealing with mentioned problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America 99(10), 6562–6566 (2002)

    Article  MATH  Google Scholar 

  2. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Draminski, M., Rada-Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., Komorowski, J.: Monte Carlo feature selection for supervised classification. Bioinformatics 24(1), 110–117 (2008)

    Article  Google Scholar 

  4. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  5. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning. Machine Learning 46(1-3), 389–422 (2002)

    Article  MATH  Google Scholar 

  6. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta—a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)

    MathSciNet  Google Scholar 

  7. Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta Package. Journal of Statistical Software 36(11), 1–13 (2010)

    Google Scholar 

  8. Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J.: A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 557–566. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Svensson, J.P., Stalpers, L.J.a., Esveldt-van Lange, R.E.E., Franken, N.a.P., Haveman, J., Klein, B., Turesson, I., Vrieling, H., Giphart-Gassler, M.: Analysis of gene expression using gene sets discriminates cancer patients with and without late radiation toxicity. PLoS Medicine 3(10), e422 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kursa, M.B., Rudnicki, W.R. (2011). A Deceiving Charm of Feature Selection: The Microarray Case Study. In: Czachórski, T., Kozielski, S., Stańczyk, U. (eds) Man-Machine Interactions 2. Advances in Intelligent and Soft Computing, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23169-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23169-8_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23168-1

  • Online ISBN: 978-3-642-23169-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics