Skip to main content

Building Classification Models from Microarray Data with Tree-Based Classification Algorithms

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4830))

Abstract

Building classification models plays an important role in DNA mircroarray data analyses. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. This paper investigates various aspects of building classification models from microarray data with tree-based classification algorithms by using Partial Least-Squares (PLS) regression as a feature selection method. Experimental results show that the Partial Least-Squares (PLS) regression method is an appropriate feature selection method and tree-based ensemble models are capable of delivering high performance classification models for microarray data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Geladi, P., Kowalski, B.: Partial least-squares regression: a tutorial. Analytical Chimica Acta 185, 1–17 (1986)

    Article  Google Scholar 

  2. Höskuldsson, A.: PLS regression methods. Journal of Chemometrics 2, 211–228 (1988)

    Article  Google Scholar 

  3. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp. 1022–1029 (1993)

    Google Scholar 

  4. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992), The latest version of C5 is available from http://www.rulequest.com

    Google Scholar 

  5. Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  6. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML. International Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  7. Breiman, L.: Random forests. Machine Learning 45(1), 5 (2001)

    Article  MATH  Google Scholar 

  8. Tan, P.J., Dowe, D.L.: Decision forests with oblique decision trees. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 593–603. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75–S83 (2003)

    Google Scholar 

  10. Boulesteix, A.L.: PLS dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology 3(1) (2004)

    Google Scholar 

  11. Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)

    Article  Google Scholar 

  12. Jirapech-umpai, T.: Classifying Gene Data Expression using an Evolutionary Algorithm. Master thesis, University of Edinburgh (2004)

    Google Scholar 

  13. Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19, 45–52 (2003)

    Article  Google Scholar 

  14. Nguyen, D.V., Rocke, D.M.: On partial least squares dimension reduction for microarray-based classification: a simulation study. Computational Statistics & Data Analysis 46(3), 407–425 (2004)

    Article  MathSciNet  Google Scholar 

  15. Roden, J.C., King, B.W., Trout, D., Mortazavi, A., Wold, B.J., Hart, C.E.: Mining gene expression data by interpreting principal components. Bioinformatics 7, 194 (2006)

    Article  Google Scholar 

  16. de Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 2(4), 251–263 (1993)

    Article  Google Scholar 

  17. Tan, P.J., Dowe, D.L.: MML inference of oblique decision trees. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1082–1088. Springer, Heidelberg (2004)

    Google Scholar 

  18. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Wadsworth & Brooks (1984)

    Google Scholar 

  19. Efron, B.: Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1), 1–26 (1979)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mehmet A. Orgun John Thornton

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, P.J., Dowe, D.L., Dix, T.I. (2007). Building Classification Models from Microarray Data with Tree-Based Classification Algorithms. In: Orgun, M.A., Thornton, J. (eds) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science(), vol 4830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76928-6_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76928-6_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76926-2

  • Online ISBN: 978-3-540-76928-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics