Building Classification Models from Microarray Data with Tree-Based Classification Algorithms

Tan, Peter J.; Dowe, David L.; Dix, Trevor I.

doi:10.1007/978-3-540-76928-6_60

Building Classification Models from Microarray Data with Tree-Based Classification Algorithms

Peter J. Tan¹,
David L. Dowe¹ &
Trevor I. Dix¹

Conference paper

2370 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4830))

Abstract

Building classification models plays an important role in DNA mircroarray data analyses. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. This paper investigates various aspects of building classification models from microarray data with tree-based classification algorithms by using Partial Least-Squares (PLS) regression as a feature selection method. Experimental results show that the Partial Least-Squares (PLS) regression method is an appropriate feature selection method and tree-based ensemble models are capable of delivering high performance classification models for microarray data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Geladi, P., Kowalski, B.: Partial least-squares regression: a tutorial. Analytical Chimica Acta 185, 1–17 (1986)
Article Google Scholar
Höskuldsson, A.: PLS regression methods. Journal of Chemometrics 2, 211–228 (1988)
Article Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp. 1022–1029 (1993)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1992), The latest version of C5 is available from http://www.rulequest.com
Google Scholar
Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)
Article MATH MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML. International Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5 (2001)
Article MATH Google Scholar
Tan, P.J., Dowe, D.L.: Decision forests with oblique decision trees. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 593–603. Springer, Heidelberg (2006)
Chapter Google Scholar
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75–S83 (2003)
Google Scholar
Boulesteix, A.L.: PLS dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology 3(1) (2004)
Google Scholar
Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Article Google Scholar
Jirapech-umpai, T.: Classifying Gene Data Expression using an Evolutionary Algorithm. Master thesis, University of Edinburgh (2004)
Google Scholar
Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19, 45–52 (2003)
Article Google Scholar
Nguyen, D.V., Rocke, D.M.: On partial least squares dimension reduction for microarray-based classification: a simulation study. Computational Statistics & Data Analysis 46(3), 407–425 (2004)
Article MathSciNet Google Scholar
Roden, J.C., King, B.W., Trout, D., Mortazavi, A., Wold, B.J., Hart, C.E.: Mining gene expression data by interpreting principal components. Bioinformatics 7, 194 (2006)
Article Google Scholar
de Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 2(4), 251–263 (1993)
Article Google Scholar
Tan, P.J., Dowe, D.L.: MML inference of oblique decision trees. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1082–1088. Springer, Heidelberg (2004)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Wadsworth & Brooks (1984)
Google Scholar
Efron, B.: Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1), 1–26 (1979)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Clayton School of Information Technology, Monash University, Melbourne, Australia
Peter J. Tan, David L. Dowe & Trevor I. Dix

Authors

Peter J. Tan
View author publications
You can also search for this author in PubMed Google Scholar
David L. Dowe
View author publications
You can also search for this author in PubMed Google Scholar
Trevor I. Dix
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mehmet A. Orgun John Thornton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, P.J., Dowe, D.L., Dix, T.I. (2007). Building Classification Models from Microarray Data with Tree-Based Classification Algorithms. In: Orgun, M.A., Thornton, J. (eds) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science(), vol 4830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76928-6_60

Download citation

DOI: https://doi.org/10.1007/978-3-540-76928-6_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76926-2
Online ISBN: 978-3-540-76928-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics