Abstract
In this work we present Rotation clustering, a novel method for consensus clustering inspired by the classifier ensemble model Rotation Forest. We demonstrate the effectiveness of our method in a real world application, the identification of enriched gene sets in a TCGA dataset derived from a clinical study on Glioblastoma multiforme.
The proposed approach is compared with a classical clustering algorithm and with two other consensus methods. Our results show that this method has been effective in finding significant gene groups that show a common behaviour in terms of expression patterns.
P. Galdi and A. Serra—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertoni, A., Valentini, G.: Random projections for assessing gene expression cluster stability. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 149–154. IEEE (2005)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 312–320. Springer, Heidelberg (2011)
Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21, 33–37 (1999). http://www.nature.com/doifinder/10.1038/4462
Chang, H.Y., Nuyten, D.S., Sneddon, J.B., Hastie, T., Tibshirani, R., Sørlie, T., Dai, H., He, Y.D., van’t Veer, L.J., Bartelink, H., et al.: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Nat. Acad. Sci. US Am. 102(10), 3738–3743 (2005)
Davis, A.P., King, B.L., Mockus, S., Murphy, C.G., Saraceni-Richards, C., Rosenstein, M., Wiegers, T., Mattingly, C.J.: The comparative toxicogenomics database: update 2011. Nucleic Acids Res. 39(suppl 1), D1067–D1072 (2011)
D’haeseleer, P.: How does gene expression clustering work? Nat. Biotechnol. 23(12), 1499–1501 (2005). http://www.nature.com/doifinder/10.1038/nbt1205-1499
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. ICML 3, 186–193 (2003)
Galdi, P., Napolitano, F., Tagliaferri, R.: Consensus clustering in gene expression. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 57–67. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_5
Gautier, E.L., Shay, T., Miller, J., Greter, M., Jakubzick, C., Ivanov, S., Helft, J., Chow, A., Elpek, K.G., Gordonov, S., et al.: Gene-expression profiles and transcriptional regulatory pathways that underlie the identity and diversity of mouse tissue macrophages. Nat. Immunol. 13(11), 1118–1128 (2012)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005). (Oxford, England). http://www.ncbi.nlm.nih.gov/pubmed/15914541
Hecht-Nielsen, R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. In: Computational Intelligence: Imitating Life, pp. 43–56 (1994)
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26(189–206), 1 (1984)
Kimes, P.K., Cabanski, C.R., Wilkerson, M.D., Zhao, N., Johnson, A.R., Perou, C.M., Makowski, L., Maher, C.A., Liu, Y., Marron, J.S., et al.: SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples. Nucleic Acids Res. 42(14), e113–e113 (2014)
Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1214–1219. IEEE (2004)
Lam, Y.K., Tsang, P.W.: eXploratory K-Means: a new simple and efficient algorithm for gene clustering. Appl. Soft Comput. 12(3), 1149–1157 (2012)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1/2), 91–118 (2003). http://link.springer.com/10.1023/A:1023949509487
Rodriguez, J., Kuncheva, L., Alonso, C.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006). http://ieeexplore.ieee.org/document/1677518/
Serra, A., Fratello, M., Fortino, V., Raiconi, G., Tagliaferri, R., Greco, D.: MVDA: a multi-view genomic data integration methodology. BMC Bioinform. 16(1), 1 (2015)
Shen, R., Mo, Q., Schultz, N., Seshan, V.E., Olshen, A.B., Huse, J., Ladanyi, M., Sander, C.: Integrative subtype discovery in glioblastoma using icluster. PLoS ONE 7(4), e35236 (2012)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. US Am. 102(43), 15545–15550 (2005). http://www.ncbi.nlm.nih.gov/pubmed/16199517, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC1239896
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(03), 337–372 (2011). http://www.worldscientific.com/doi/abs/10.1142/S0218001411008683
Wang, B., Mezlini, A.M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B., Goldenberg, A.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333–337 (2014). http://www.nature.com/doifinder/10.1038/nmeth.2810
Acknowledgments
We would like to thank Teresa Savino and Luca Puglia for the helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Galdi, P., Serra, A., Tagliaferri, R. (2017). Rotation Clustering: A Consensus Clustering Approach to Cluster Gene Expression Data. In: Petrosino, A., Loia, V., Pedrycz, W. (eds) Fuzzy Logic and Soft Computing Applications. WILF 2016. Lecture Notes in Computer Science(), vol 10147. Springer, Cham. https://doi.org/10.1007/978-3-319-52962-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-52962-2_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52961-5
Online ISBN: 978-3-319-52962-2
eBook Packages: Computer ScienceComputer Science (R0)