Skip to main content

A Comparative Study of Microarray Data Classification Methods Based on Ensemble Biological Relevant Gene Sets

  • Conference paper
Advances in Bioinformatics

Abstract

In this work we study the utilization of several ensemble alternatives for the task of classifying microarray data by using prior knowledge known to be biologically relevant to the target disease. The purpose of the work is to obtain an accurate ensemble classification model able to outperform baseline classifiers by introducing diversity in the form of different gene sets. The proposed model takes advantage of WhichGenes, a powerful gene set building tool that allows the automatic extraction of lists of genes from multiple sparse data sources. Preliminary results using different datasets and several gene sets show that the proposal is able to outperform basic classifiers by using existing prior knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  2. Ressom, H.W., Varghese, R.S., Zhang, Z., Xuan, J., Clarke, R.: Classification algorithms for phenotype prediction in genomics and proteomics. Frontiers in Bioscience 13, 691–708 (2008)

    Article  Google Scholar 

  3. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley Interscience, Hoboken (2004)

    Book  MATH  Google Scholar 

  4. Liu, K.H., Li, B., Wu, Q.Q., Zhang, J., Du, J.X., Liu, G.Y.: Microarray data classification based on ensemble independent component selection. Computers in Biology and Medicine 39(11), 953–960 (2009)

    Article  Google Scholar 

  5. Lottaz, C., Spang, R.: Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data. Bioinformatics 21(9), 1971–1978 (2005)

    Article  Google Scholar 

  6. Cordero, F., Botta, M., Calogero, R.A.: Microarray data analysis and mining approaches. Briefings in Functional Genomics and Proteomics 6(4), 265–281 (2007)

    Article  Google Scholar 

  7. Bellazzi, R., Zupan, B.: Methodological Review: Towards knowledge-based gene expression data mining. Journal of Biomedical Informatics 40(6), 787–802 (2007)

    Article  Google Scholar 

  8. Glez-Peña, D., Gómez-López, G., Pisano, D.G., Fdez-Riverola, F.: WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis. Nucleic Acids Research 37(Web Server issue), W329–W334 (2009)

    Article  Google Scholar 

  9. Díaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)

    Article  Google Scholar 

  10. Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Computers in Biology and Medicine 36(6), 553–573 (2006)

    Article  Google Scholar 

  11. Liu, K.H., Huang, D.S.: Cancer classification using Rotation Forest. Computers in Biology and Medicine 38(5), 601–610 (2008)

    Article  Google Scholar 

  12. Liu, K.H., Xu, C.G.: A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics 25(3), 331–337 (2009)

    Article  Google Scholar 

  13. Opitz, D.: Feature selection for ensembles. In: Proceedings of 16th National Conference on Artificial Intelligence, Orlando, Florida (1999)

    Google Scholar 

  14. Kuncheva, L.I., Jain, L.C.: Designing classifier fusion systems by genetic algorithms. IEEE Transactions on Evolutionary Computation 4(4), 327–336 (2000)

    Article  Google Scholar 

  15. Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles using the multi-objective optimization approach. Studies in Computational Intelligence 16, 49–74 (2006)

    Article  Google Scholar 

  16. Gutiérrez, N.C., López-Pérez, R., Hernández, J.M., Isidro, I., González, B., Delgado, M., Fermiñán, E., García, J.L., Vázquez, L., González, M., San Miguel, J.F.: Gene expression profile reveals deregulation of genes with relevant functionsin the different subclasses of acute myeloid leukemia. Leukemia 19(3), 402–409 (2005)

    Article  Google Scholar 

  17. Bullinger, L., Döhner, K., Bair, E., Fröhling, S., Schlenk, R.F., Tibshirani, R., Döhner, H., Pollack, J.R.: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. The New England Journal of Medicine 350(16), 1506–1516 (2004)

    Article  Google Scholar 

  18. Valk, P.J., Verhaak, R.G., Beijen, M.A., Erpelinck, C.A., Barjesteh van Waalwijk van Doorn-Khosrovani, S., Boer, J., Beverloo, H., Moorhouse, M., van der Spek, P., Löwenberg, B., Delwel, R.: Prognostically useful gene-expression profiles in Acute Myeloid Leukemia. The New England Journal of Medicine 350(16), 1617–1628 (2004)

    Article  Google Scholar 

  19. Tai, F., Pan, W.: Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics 23(14), 1775–1782 (2007)

    Article  Google Scholar 

  20. Wei, Z., Li, H.: Nonparametric pathway-based regression models for analysis of genomic data. Biostatistics 8(2), 265–284 (2007)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reboiro-Jato, M., Glez-Peña, D., Gálvez, J.F., Fidalgo, R.L., Díaz, F., Fdez-Riverola, F. (2010). A Comparative Study of Microarray Data Classification Methods Based on Ensemble Biological Relevant Gene Sets. In: Rocha, M.P., Riverola, F.F., Shatkay, H., Corchado, J.M. (eds) Advances in Bioinformatics. Advances in Intelligent and Soft Computing, vol 74. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13214-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13214-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13213-1

  • Online ISBN: 978-3-642-13214-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics