Knowledge Extraction from Microarray Datasets Using Combined Multiple Models to Predict Leukemia Types

Stiglic, Gregor; Khan, Nawaz; Kokol, Peter

doi:10.1007/978-3-540-78488-3_20

Gregor Stiglic⁶,
Nawaz Khan⁷ &
Peter Kokol⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

1208 Accesses

Summary

Recent advances in microarray technology offer the ability to measure expression levels of thousands of genes simultaneously. Analysis of such data helps us identifying different clinical outcomes that are caused by expression of a few predictive genes. This chapter not only aims to select key predictive features for leukemia expression, but also demonstrates the rules that classify differentially expressed leukemia genes. The feature extraction and classification are carried out with combination of the high accuracy of ensemble based algorithms, and comprehensibility of a single decision tree. These allow deriving exact rules by describing gene expression differences among significantly expressed genes in leukemia. It is evident from our results that it is possible to achieve better accuracy in classifying leukemia without sacrificing the level of comprehensibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L.-H. Loo, Identifying Differentially Expressed Genes in DNA Microarray Data, PhD Thesis, Drexel University, 2004
Google Scholar
Z. Guo, T. Zhang, X. Li, Q. Wang, J. Xu, H. Yu, J. Zhu, H. Wang, C. Wang, E. J. Topol, Q. Wang and S. Rao, Towards precise classification of cancers based on robust gene functional expression profiles, BMC Bioinformatics, vol. 6, no. 1, p. 58, 2005
Article Google Scholar
J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, Classification and diagnostic pre-diction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, vol. 7, no. 6, pp. 673–679, 2001
Article Google Scholar
B. Brors, A. Kohlmann, S. Schnittger, C. Schoch, T. Haferlach and R. Eils, Classification of Cytogenetically Defined AML Patients by Decision Tree Analysis of Statistically Selected Gene Expression Data, in Proceedings of 43rd Annual Meeting of the American Society of Hematology (ASH01), Orlando, FL (USA), December 7–12, 2001
Google Scholar
J. Li and K. Ramamohanarao, A Tree-based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer, in Proceedings of the PAKDD 2004, pp. 682–691, Sydney, Australia, February 2004
Google Scholar
M. Dettling, BagBoosting for tumor classification with gene expression data, Bioinformatics, vol. 20, no. 18, pp. 3583–3593, 2004
Article Google Scholar
D. P. Berrar, B. Sturgeon, I. Bradbury, C. S. Downes and W. Dubitzky, Microarray Data Integration and Machine Learning Techniques For Lung Cancer Survival Prediction, in Proceedings of Critical Assessment of Microarray Data Analysis (CAMDA 2003), Durham, North Carolina, USA, pp. 43–54, November 2003
Google Scholar
P. Domingos, Knowledge discovery via multiple models, Intelligent Data Analysis, vol. 2 no. 1–4, pp. 187–202, 1998
Article Google Scholar
R. Tibshirani and K. Knight, Model search and inference by bootstrap bumping, Journal of Computational and Graphical Statistics, vol. 8, pp. 671–686, 1999
Article Google Scholar
O. Boz, Converting a Trained Neural Network To a Decision Tree DecText – Decision Tree Etxractor, PhD thesis, Computer Science and Engineering, Lehigh University, 2000
Google Scholar
M. W. Craven, Extracting Comprehensible Models from Trained Neural Networks, PhD thesis, University of Wisconsin – Madison, 1996
Google Scholar
Z.-H. Zhou and Y. Jiang, NeC4.5: neural ensemble based C4.5, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 770–773, 2004
Article Google Scholar
V. Estruch, C. Ferri, J. Hernndez-Orallo and M. J. Ramrez-Quintana, Simple Mimetic Classifiers, in Proceedings of IAPR International Conference on Machine Learning and Data Mining (MLDM2003), pp. 156–171, 2003
Google Scholar
D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Machine Learning, vol. 15, pp. 201–221, 1994
Google Scholar
M. W. Craven and J. W. Shavlik, Extracting comprehensible concept representations from trained neural networks, in Working Notes on the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, pp. 61–75, 1995
Google Scholar
H. Zhang, C. Y. Yu and B. Singer, Cell and Tumor Classification Using Gene Expression Data: Construction of Forests, in Proceedings of National Academy of Sciences U S A, vol. 100, no. 7, pp. 4168–4172, 2003
Google Scholar
L. Breiman, Bagging predictors, Machine Learning, Vol. 24, no. 2, pp. 123–140, 1996
MATH MathSciNet Google Scholar
L. Breiman, Random forests, Machine Learning, Vol. 45, no. 1, pp. 5–31, 2001
Article MATH Google Scholar
T. G. Dietterich, Ensemble Learning, in The Handbook of Brain Theory and Neural Networks, 2nd ed., M. A. Arbib, Ed. MIT, Cambridge, MA, pp. 405–408, 2002
Google Scholar
J. Li and H. Liu, Ensembles of Cascading Trees, in Proceedings of IEEE International Conference on Data Mining (ICDM 2003), IEEE Computer Society, Melbourne, p. 585
Google Scholar
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander, Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, vol. 286, no. 5439, pp. 531–537, 1999
Article Google Scholar
L. J. van ’t Veer, H. Dai, M. J. van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. Der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend, Gene expression profiling predicts clinical outcome of breast cancer, Nature, vol. 415, pp. 530–536, 2002
Google Scholar
G. J. Gordon, R. V. Jensen, L.-L. Hsiao, S. R. Gullans, J. E. Blumenstock, S. Ramaswami, W. G. Richards, D. J. Sugarbaker and R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Research, vol. 62, no. 17, pp. 4963–4967, 2002
Google Scholar
S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Min-den, S. E. Sallan, E. S. Lander, T. R. Golub and S. J. Korsmeyer, MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia, Nature Genetics, vol. 30, no. 1, pp. 41–47, 2002
Article Google Scholar
Y. Lu and J. Han, Cancer classification using gene expression data, Information Systems, vol. 28, no. 4, pp. 243–268, 2003
Article MATH Google Scholar
I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, Morgan Kaufmann, San Francisco, 2000
Google Scholar
J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, pp. 81–106, 1986
Google Scholar
A. Ben-Dor, N. Friedman and Z. Yakhini, Scoring genes for relevance, Agilent Technologies Technical Report AGL-2000-13
Google Scholar
I. Kononenko, Estimating Attributes: Analysis and Extensions of Relief, in Proceedings of ECML’94, pp. 171–182, Springer, Berlin Heidelberg New York, 1994
Google Scholar
Y. Wang and F. Makedon, Application of Relief-F Feature Filtering Algorithm to Selecting Informative Genes for Cancer Classification Using Microarray Data, in Proceedings of IEEE Computational Systems Bioinformatics Conference, pp. 497–498, Stanford, California, 2004
Google Scholar
I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, vol. 46, no. 1–3, pp. 389–422, 2002
Article MATH Google Scholar
K. Fujarewicz, M. Kimmel, J. Rzeszowska-Wolny and A. Swierniak, A note on classification of gene expression data using support vector machines, Journal of Biological Systems, vol. 11, no. 1, pp. 43–56, 2003
Article MATH Google Scholar
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, Berlin Heidelberg New York, 2001
MATH Google Scholar
M. Braga-Neto and E.R. Dougherty, Is cross-validation valid for small-sample microarray classification?, Bioinformatics, vol. 20, no. 3, pp. 374–380, 2004
Article Google Scholar
T. Umpai and S. Aitken, Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes, BMC Bioinformatics, vol. 6, no. 148, 2005
Google Scholar
V. Aris and M. Rece, A Method to Improve Detection of Disease Using Selectively Expressed Genes in Microarray Data, Methods of Microarray Data Analysis, Kluwer, Dordecht, 2002
Google Scholar
A. Venditti, G.D. Peeta, F. Buccisano, A. Tambarini, et. al., Minimally differentiated acute myleoid leukemia (AML-MO): Comparisson of 25 cases with other French–American–British subtypes, Blood, vol. 89, no. 2, pp. 621–629, 1997
Google Scholar
A. Yokoyama, J. Okabe-Kado, et. al., Evaluation by multivariate analysis of the differentiation inhibitory factor nm23 as a prognostic factor in acute myelogenous leukemia and application to other hematologic malignancies, Blood, vol. 91, no. 6, pp. 1845–1851, 1998
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000, Maribor, Slovenia
Gregor Stiglic & Peter Kokol
School of Computing Science, Middlesex University, The Burrough, Hendon, London, NW4 4BT, UK
Nawaz Khan

Authors

Gregor Stiglic
View author publications
You can also search for this author in PubMed Google Scholar
Nawaz Khan
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kokol
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, 95192, USA
Tsau Young Lin
Department of Computer Science and Information Systems, Kennesaw State University, Building 11, Room 3060 1000 Chastain Road, Kennesaw, GA, 30144, USA
Ying Xie
Department of Computer Science, The University at Stony Brook, Stony Brook, New York, 11794-4400, USA
Anita Wasilewska
Institute of Information Science, Academia Sinica, No 128, Academia Road, Section 2 Nankang, Taipei, 11529, Taiwan
Churn-Jung Liau

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stiglic, G., Khan, N., Kokol, P. (2008). Knowledge Extraction from Microarray Datasets Using Combined Multiple Models to Predict Leukemia Types. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-78488-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics