Abstract
In this work we investigate the application of predictive clustering trees (PCTs) for analysing gene expression data. PCTs provide a flexible approach for both predictive and descriptive analysis, both often used on gene expression data. To begin with, we use gene expression data for building predictive models for associated clinical data, where we compare single-target with multi-target models. Related to this, random forests of PCTs (single and multi-target) are used to assess the importance of individual genes w.r.t. the clinical parameters. For a more descriptive analysis, we perform a so-called constrained clustering of expression data. Also, we extend the descriptive analysis to take into account a temporal component, by using PCTs for finding descriptions of short time series of gene expression data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In Proc.15th Int’l Conf. on Machine Learning, pages 55–63. Morgan Kaufman, 1998.
L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
S. Džeroski, V. Gjorgjioski, I. Slavkov, and J. Struyf. Analysis of time series data with predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 63–80, Springer Berlin, 2007.
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25(1): 25–29, 2000
A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown. Genomic expression program in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11:4241–4257, 2000.
D. Kocev, I. Slavkov, and S. Džeroski. More is better: ranking with multiple targets for biomarker discovery. In Proc. 2nd Int’l Wsp on Machine Learning in Systems Biology, page 133, University of Liege 2008.
D. Kocev, J. Struyf, and S. Džeroski. Beam search induction and similarity constraints for predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 134–151. Springer, Berlin 2007.
J. M. Maris. The biologic basis for neuroblastoma heterogeneity and risk stratification. Current Opinion in Pediatrics, 17(1):7–13, 2005.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA 1993.
A. Schramm, J. H. Schulte, L. Klein-Hitpass, W. Havers, H. Sieverts, B. Berwanger, H. Christiansen, P.Warnat, B. Brors, J. Eils, R. Eils, and A. Eggert. Prediction of clinical outcome and biological characterization of neuroblastoma by expression profiling. Oncogene, 7902–7912, 2005.
J. Sese, Y. Kurokawa, M. Monden, K. Kato, and S. Morishita. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 20:3137–3145, 2004.
I. Slavkov, S. Džeroski, B. Peterlin, and L. Lovrečić. Analysis of huntington’s disease gene expression profiles using constrained clustering. Informatica Medica Slovenica, 11(2):43–51, 2006.
I. Slavkov, V. Gjorgjioski, J. Struyf, and S. Džeroski. Finding explained groups of time-course gene expression profiles with predictive clustering trees. Molecular bioSystems, 6(7):729–740, 2010.
I. Slavkov, B. Ženko, and S. Džeroski. Evaluation method for feature rankings and their aggregations for biomarker discover. In Proc. 3rd Intl Wshp on Machine Learning in Systems Biology, JMLR: Workshop and Conference Proceedings 8: 122–135 (2010)
J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 222–233. Springer, Berlin 2006.
J. Struyf, S. Dzeroski, H. Blockeel, and A. Clare. Hierarchical multi-classification with predictive clustering trees in functional genomics. In 12th Portuguese Conference on Artificial Intelligence, pages 272–283. Springer 2005.
L. Todorovski, B. Cestnik, M. Kline, N. Lavrač, and S. Džeroski. Qualitative clustering of short time-series: A case study of firms reputation data. In Proc. Wshp on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pages 141–149, ECML/PKDD 2002.
B. Ženko, S. Džeroski, and J. Struyf. Learning predictive clustering rules. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, pages 234–250. Springer, Berlin 2005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Slavkov, I., Džeroski, S. (2010). Analyzing Gene Expression Data with Predictive Clustering Trees. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_16
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7738-0_16
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7737-3
Online ISBN: 978-1-4419-7738-0
eBook Packages: Computer ScienceComputer Science (R0)