Skip to main content

Analyzing Gene Expression Data with Predictive Clustering Trees

  • Chapter
  • First Online:
Inductive Databases and Constraint-Based Data Mining

Abstract

In this work we investigate the application of predictive clustering trees (PCTs) for analysing gene expression data. PCTs provide a flexible approach for both predictive and descriptive analysis, both often used on gene expression data. To begin with, we use gene expression data for building predictive models for associated clinical data, where we compare single-target with multi-target models. Related to this, random forests of PCTs (single and multi-target) are used to assess the importance of individual genes w.r.t. the clinical parameters. For a more descriptive analysis, we perform a so-called constrained clustering of expression data. Also, we extend the descriptive analysis to take into account a temporal component, by using PCTs for finding descriptions of short time series of gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In Proc.15th Int’l Conf. on Machine Learning, pages 55–63. Morgan Kaufman, 1998.

    Google Scholar 

  2. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

    Article  MATH  Google Scholar 

  3. S. Džeroski, V. Gjorgjioski, I. Slavkov, and J. Struyf. Analysis of time series data with predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 63–80, Springer Berlin, 2007.

    Google Scholar 

  4. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25(1): 25–29, 2000

    Article  Google Scholar 

  5. A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown. Genomic expression program in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11:4241–4257, 2000.

    Google Scholar 

  6. D. Kocev, I. Slavkov, and S. Džeroski. More is better: ranking with multiple targets for biomarker discovery. In Proc. 2nd Int’l Wsp on Machine Learning in Systems Biology, page 133, University of Liege 2008.

    Google Scholar 

  7. D. Kocev, J. Struyf, and S. Džeroski. Beam search induction and similarity constraints for predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 134–151. Springer, Berlin 2007.

    Google Scholar 

  8. J. M. Maris. The biologic basis for neuroblastoma heterogeneity and risk stratification. Current Opinion in Pediatrics, 17(1):7–13, 2005.

    Article  MathSciNet  Google Scholar 

  9. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA 1993.

    Google Scholar 

  10. A. Schramm, J. H. Schulte, L. Klein-Hitpass, W. Havers, H. Sieverts, B. Berwanger, H. Christiansen, P.Warnat, B. Brors, J. Eils, R. Eils, and A. Eggert. Prediction of clinical outcome and biological characterization of neuroblastoma by expression profiling. Oncogene, 7902–7912, 2005.

    Google Scholar 

  11. J. Sese, Y. Kurokawa, M. Monden, K. Kato, and S. Morishita. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 20:3137–3145, 2004.

    Article  Google Scholar 

  12. I. Slavkov, S. Džeroski, B. Peterlin, and L. Lovrečić. Analysis of huntington’s disease gene expression profiles using constrained clustering. Informatica Medica Slovenica, 11(2):43–51, 2006.

    Google Scholar 

  13. I. Slavkov, V. Gjorgjioski, J. Struyf, and S. Džeroski. Finding explained groups of time-course gene expression profiles with predictive clustering trees. Molecular bioSystems, 6(7):729–740, 2010.

    Article  Google Scholar 

  14. I. Slavkov, B. Ženko, and S. Džeroski. Evaluation method for feature rankings and their aggregations for biomarker discover. In Proc. 3rd Intl Wshp on Machine Learning in Systems Biology, JMLR: Workshop and Conference Proceedings 8: 122–135 (2010)

    Google Scholar 

  15. J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 222–233. Springer, Berlin 2006.

    Google Scholar 

  16. J. Struyf, S. Dzeroski, H. Blockeel, and A. Clare. Hierarchical multi-classification with predictive clustering trees in functional genomics. In 12th Portuguese Conference on Artificial Intelligence, pages 272–283. Springer 2005.

    Google Scholar 

  17. L. Todorovski, B. Cestnik, M. Kline, N. Lavrač, and S. Džeroski. Qualitative clustering of short time-series: A case study of firms reputation data. In Proc. Wshp on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pages 141–149, ECML/PKDD 2002.

    Google Scholar 

  18. B. Ženko, S. Džeroski, and J. Struyf. Learning predictive clustering rules. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 234–250. Springer, Berlin 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivica Slavkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Slavkov, I., Džeroski, S. (2010). Analyzing Gene Expression Data with Predictive Clustering Trees. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_16

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics