Skip to main content

Knowledge Extraction from Microarray Datasets Using Combined Multiple Models to Predict Leukemia Types

  • Chapter
Data Mining: Foundations and Practice

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

  • 1208 Accesses

Summary

Recent advances in microarray technology offer the ability to measure expression levels of thousands of genes simultaneously. Analysis of such data helps us identifying different clinical outcomes that are caused by expression of a few predictive genes. This chapter not only aims to select key predictive features for leukemia expression, but also demonstrates the rules that classify differentially expressed leukemia genes. The feature extraction and classification are carried out with combination of the high accuracy of ensemble based algorithms, and comprehensibility of a single decision tree. These allow deriving exact rules by describing gene expression differences among significantly expressed genes in leukemia. It is evident from our results that it is possible to achieve better accuracy in classifying leukemia without sacrificing the level of comprehensibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.-H. Loo, Identifying Differentially Expressed Genes in DNA Microarray Data, PhD Thesis, Drexel University, 2004

    Google Scholar 

  2. Z. Guo, T. Zhang, X. Li, Q. Wang, J. Xu, H. Yu, J. Zhu, H. Wang, C. Wang, E. J. Topol, Q. Wang and S. Rao, Towards precise classification of cancers based on robust gene functional expression profiles, BMC Bioinformatics, vol. 6, no. 1, p. 58, 2005

    Article  Google Scholar 

  3. J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, Classification and diagnostic pre-diction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, vol. 7, no. 6, pp. 673–679, 2001

    Article  Google Scholar 

  4. B. Brors, A. Kohlmann, S. Schnittger, C. Schoch, T. Haferlach and R. Eils, Classification of Cytogenetically Defined AML Patients by Decision Tree Analysis of Statistically Selected Gene Expression Data, in Proceedings of 43rd Annual Meeting of the American Society of Hematology (ASH01), Orlando, FL (USA), December 7–12, 2001

    Google Scholar 

  5. J. Li and K. Ramamohanarao, A Tree-based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer, in Proceedings of the PAKDD 2004, pp. 682–691, Sydney, Australia, February 2004

    Google Scholar 

  6. M. Dettling, BagBoosting for tumor classification with gene expression data, Bioinformatics, vol. 20, no. 18, pp. 3583–3593, 2004

    Article  Google Scholar 

  7. D. P. Berrar, B. Sturgeon, I. Bradbury, C. S. Downes and W. Dubitzky, Microarray Data Integration and Machine Learning Techniques For Lung Cancer Survival Prediction, in Proceedings of Critical Assessment of Microarray Data Analysis (CAMDA 2003), Durham, North Carolina, USA, pp. 43–54, November 2003

    Google Scholar 

  8. P. Domingos, Knowledge discovery via multiple models, Intelligent Data Analysis, vol. 2 no. 1–4, pp. 187–202, 1998

    Article  Google Scholar 

  9. R. Tibshirani and K. Knight, Model search and inference by bootstrap bumping, Journal of Computational and Graphical Statistics, vol. 8, pp. 671–686, 1999

    Article  Google Scholar 

  10. O. Boz, Converting a Trained Neural Network To a Decision Tree DecText – Decision Tree Etxractor, PhD thesis, Computer Science and Engineering, Lehigh University, 2000

    Google Scholar 

  11. M. W. Craven, Extracting Comprehensible Models from Trained Neural Networks, PhD thesis, University of Wisconsin – Madison, 1996

    Google Scholar 

  12. Z.-H. Zhou and Y. Jiang, NeC4.5: neural ensemble based C4.5, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 770–773, 2004

    Article  Google Scholar 

  13. V. Estruch, C. Ferri, J. Hernndez-Orallo and M. J. Ramrez-Quintana, Simple Mimetic Classifiers, in Proceedings of IAPR International Conference on Machine Learning and Data Mining (MLDM2003), pp. 156–171, 2003

    Google Scholar 

  14. D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Machine Learning, vol. 15, pp. 201–221, 1994

    Google Scholar 

  15. M. W. Craven and J. W. Shavlik, Extracting comprehensible concept representations from trained neural networks, in Working Notes on the IJCAI’95 Workshop on Comprehensibility in Machine Learning, Montreal, Canada, pp. 61–75, 1995

    Google Scholar 

  16. H. Zhang, C. Y. Yu and B. Singer, Cell and Tumor Classification Using Gene Expression Data: Construction of Forests, in Proceedings of National Academy of Sciences U S A, vol. 100, no. 7, pp. 4168–4172, 2003

    Google Scholar 

  17. L. Breiman, Bagging predictors, Machine Learning, Vol. 24, no. 2, pp. 123–140, 1996

    MATH  MathSciNet  Google Scholar 

  18. L. Breiman, Random forests, Machine Learning, Vol. 45, no. 1, pp. 5–31, 2001

    Article  MATH  Google Scholar 

  19. T. G. Dietterich, Ensemble Learning, in The Handbook of Brain Theory and Neural Networks, 2nd ed., M. A. Arbib, Ed. MIT, Cambridge, MA, pp. 405–408, 2002

    Google Scholar 

  20. J. Li and H. Liu, Ensembles of Cascading Trees, in Proceedings of IEEE International Conference on Data Mining (ICDM 2003), IEEE Computer Society, Melbourne, p. 585

    Google Scholar 

  21. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander, Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, vol. 286, no. 5439, pp. 531–537, 1999

    Article  Google Scholar 

  22. L. J. van ’t Veer, H. Dai, M. J. van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. Der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend, Gene expression profiling predicts clinical outcome of breast cancer, Nature, vol. 415, pp. 530–536, 2002

    Google Scholar 

  23. G. J. Gordon, R. V. Jensen, L.-L. Hsiao, S. R. Gullans, J. E. Blumenstock, S. Ramaswami, W. G. Richards, D. J. Sugarbaker and R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Research, vol. 62, no. 17, pp. 4963–4967, 2002

    Google Scholar 

  24. S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Min-den, S. E. Sallan, E. S. Lander, T. R. Golub and S. J. Korsmeyer, MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia, Nature Genetics, vol. 30, no. 1, pp. 41–47, 2002

    Article  Google Scholar 

  25. Y. Lu and J. Han, Cancer classification using gene expression data, Information Systems, vol. 28, no. 4, pp. 243–268, 2003

    Article  MATH  Google Scholar 

  26. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, Morgan Kaufmann, San Francisco, 2000

    Google Scholar 

  27. J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, pp. 81–106, 1986

    Google Scholar 

  28. A. Ben-Dor, N. Friedman and Z. Yakhini, Scoring genes for relevance, Agilent Technologies Technical Report AGL-2000-13

    Google Scholar 

  29. I. Kononenko, Estimating Attributes: Analysis and Extensions of Relief, in Proceedings of ECML’94, pp. 171–182, Springer, Berlin Heidelberg New York, 1994

    Google Scholar 

  30. Y. Wang and F. Makedon, Application of Relief-F Feature Filtering Algorithm to Selecting Informative Genes for Cancer Classification Using Microarray Data, in Proceedings of IEEE Computational Systems Bioinformatics Conference, pp. 497–498, Stanford, California, 2004

    Google Scholar 

  31. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, vol. 46, no. 1–3, pp. 389–422, 2002

    Article  MATH  Google Scholar 

  32. K. Fujarewicz, M. Kimmel, J. Rzeszowska-Wolny and A. Swierniak, A note on classification of gene expression data using support vector machines, Journal of Biological Systems, vol. 11, no. 1, pp. 43–56, 2003

    Article  MATH  Google Scholar 

  33. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, Berlin Heidelberg New York, 2001

    MATH  Google Scholar 

  34. M. Braga-Neto and E.R. Dougherty, Is cross-validation valid for small-sample microarray classification?, Bioinformatics, vol. 20, no. 3, pp. 374–380, 2004

    Article  Google Scholar 

  35. T. Umpai and S. Aitken, Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes, BMC Bioinformatics, vol. 6, no. 148, 2005

    Google Scholar 

  36. V. Aris and M. Rece, A Method to Improve Detection of Disease Using Selectively Expressed Genes in Microarray Data, Methods of Microarray Data Analysis, Kluwer, Dordecht, 2002

    Google Scholar 

  37. A. Venditti, G.D. Peeta, F. Buccisano, A. Tambarini, et. al., Minimally differentiated acute myleoid leukemia (AML-MO): Comparisson of 25 cases with other French–American–British subtypes, Blood, vol. 89, no. 2, pp. 621–629, 1997

    Google Scholar 

  38. A. Yokoyama, J. Okabe-Kado, et. al., Evaluation by multivariate analysis of the differentiation inhibitory factor nm23 as a prognostic factor in acute myelogenous leukemia and application to other hematologic malignancies, Blood, vol. 91, no. 6, pp. 1845–1851, 1998

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Stiglic, G., Khan, N., Kokol, P. (2008). Knowledge Extraction from Microarray Datasets Using Combined Multiple Models to Predict Leukemia Types. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78488-3_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78487-6

  • Online ISBN: 978-3-540-78488-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics