Abstract
The recent advances in genomic technologies and the availability of large-scale datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. Indeed, a key feature that lacks in many proposed approach is the biological interpretation of the obtained results. In this paper, we will discuss such an issue by analysing the results obtained by several clustering algorithms w.r.t. their biological relevance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahn, J., Yoon, Y., Park, S.: Noise-robust algorithm for identifying functionally associated biclusters from gene expression data. Information Sciences 181(3), 435–449 (2011)
Arnau, V., Mars, S., MarÃn, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), 364–378 (2005)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Au, W.-H., Chan, K.C.C., Wong, A.K.C., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)
Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)
Bar-Joseph, Z., Demaine, E.D., Gifford, D.K., Srebro, N., Hamel, A.M., Jaakkola, T.: K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–1078 (2003)
Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: Neural Information Processing Systems, pp. 121–128 (2008)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)
Cheung, Y.M.: k*-means: A new generalized k-means clustering algorithm. Pattern Recognition Letters 24(15), 2883–2893 (2003)
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The transcriptional program of sporulation in budding yeast. Science 282(5389), 699–705 (1998)
Datta, S., Datta, S.: Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics 7(S-4) (2006)
Defays, D.: An efficient algorithm for a complete link method. The Computer Journal 20, 364–366 (1973)
Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23(12), 1499–1501 (2005)
Einbond, L.S., Su, T., Wu, H.A., Friedman, R., Wang, X., Ramirez, A., Kronenberg, F., Weinstein, I.B.: The growth inhibitory effect of actein on human breast cancer cells is associated with activation of stress response pathways. International Journal of Cancer 121(9), 2073–2083 (2007)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (1996)
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast detection of xml structural similarity. IEEE Transactions on Knowledge and Data Engineering 17(2), 160–175 (2005)
Galluccio, L., Michel, O., Comon, P., Kliger, M., Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Information Sciences 251, 96–113 (2013)
Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)
Gaynor, S., Bair, E.: Identification of biologically relevant subtypes via preweighted sparse clustering. In: Biostatistics, pp. 1–33 (2013)
Gollub, J., Sherlock, G.: Clustering microarray data. Methods in Enzymology 411, 194–213 (2006)
Graham, K., De Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., Willers, H., Sebastiani, P., Rosenberg, C.L.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. British Journal of Cancer 102(8), 1284–1293 (2010)
Gronau, I., Moran, S.: Optimal implementations of upgma and other common clustering algorithms. Technical report (2007)
Guzzi, P.H., Cannataro, M.: mu-cs: An extension of the tm4 platform to manage affymetrix binary data. BMC Bioinformatics 11, 315 (2010)
Guzzi, P.H., Di Martino, M.T., Tradigo, G., Veltri, P., Tassone, P., Tagliaferri, P., Cannataro, M.: Automatic summarisation and annotation of microarray data. Soft Computing 15(8), 1505–1512 (2011)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2000)
Heard, N., Holmes, C., Stephens, D.: A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101(473), 18 (2006)
Heller, K.A., Ghahramani, Z.: Bayesian hierarchical clustering. In: International Conference on Machine Learning, pp. 297–304 (2005)
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31(4), e15 (2003)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31 (September 1999)
Jornsten, R., Yu, B.: Simultaneous gene clustering and subset selection for sample classification via mdl. Bioinformatics 19(9), 1100–1109 (2003)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2005)
Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)
Koschmieder, A., Zimmermann, K., Trißl, S., Stoltmann, T., Leser, U.: Tools for managing and analyzing microarray data. Briefings in Bioinformatics 13(1), 46–60 (2012)
Lai, J.Z.C., Huang, T.J.: An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Information Sciences 181(9), 1722–1734 (2011)
Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Masciari, E., Mazzeo, G.M., Zaniolo, C.: Analysing microarray expression data through effective clustering. Information Sciences 262, 32–45 (2014)
Pizzuti, C., Rombo, S.E.: A coclustering approach for mining large protein-protein interaction networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(3), 717–730 (2012)
Plumert, J.M.: Flexibility in children’s use of spatial and categorical organizational strategies. Recall Developmental Psychology 30(5), 738–747 (1994)
Rasmussen, C., De La Cruz, B., Ghahramani, Z., Wild, D.L.: Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2007)
Savage, R., Heller, K., Xu, Y., Ghahramani, Z., Truman, W., Grant, M., Denby, K., Wild, D.: R/bhc: Fast bayesian hierarchical clustering for microarray data. BMC Bioinformatics 10(1), 242 (2009)
Sebastiani, P., Hui, X., Ramoni, M.: Bayesian analysis of comparative microarray experiments by model averaging. Bayesian Analysis 1(4), 707–732 (2006)
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16, 30–34 (1973)
Smyth, G.: limma: Linear models for microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health, ch. 23, pp. 397–420. Springer, New York (2005)
Veenman, C.J., Reinders, M.J.T.: The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(9), 1417–1429 (2005)
Wang, W., Yang, J., Muntz, R.R.: Sting: A statistical information grid approach to spatial data mining. In: Very Large Data Bases, pp. 186–195 (1997)
Wang, W., Yang, J., Muntz, R.R.: An approach to active spatial data mining based on statistical information. IEEE Transactions on Knowledge and Data Engineering 12(5), 715–728 (2000)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Guzzi, P.H., Masciari, E., Mazzeo, G.M., Zaniolo, C. (2014). A Discussion on the Biological Relevance of Clustering Results. In: Bursa, M., Khuri, S., Renda, M.E. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2014. Lecture Notes in Computer Science, vol 8649. Springer, Cham. https://doi.org/10.1007/978-3-319-10265-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-10265-8_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10264-1
Online ISBN: 978-3-319-10265-8
eBook Packages: Computer ScienceComputer Science (R0)