A Discussion on the Biological Relevance of Clustering Results

Guzzi, Pietro Hiram; Masciari, Elio; Mazzeo, Giuseppe Massimiliano; Zaniolo, Carlo

doi:10.1007/978-3-319-10265-8_3

Pietro Hiram Guzzi²⁰,
Elio Masciari¹⁸,
Giuseppe Massimiliano Mazzeo¹⁹ &
…
Carlo Zaniolo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8649))

Included in the following conference series:

International Conference on Information Technology in Bio- and Medical Informatics

657 Accesses
5 Citations

Abstract

The recent advances in genomic technologies and the availability of large-scale datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. Indeed, a key feature that lacks in many proposed approach is the biological interpretation of the obtained results. In this paper, we will discuss such an issue by analysing the results obtained by several clustering algorithms w.r.t. their biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahn, J., Yoon, Y., Park, S.: Noise-robust algorithm for identifying functionally associated biclusters from gene expression data. Information Sciences 181(3), 435–449 (2011)
Article Google Scholar
Arnau, V., Mars, S., Marín, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), 364–378 (2005)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Au, W.-H., Chan, K.C.C., Wong, A.K.C., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)
Article Google Scholar
Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)
Article Google Scholar
Bar-Joseph, Z., Demaine, E.D., Gifford, D.K., Srebro, N., Hamel, A.M., Jaakkola, T.: K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–1078 (2003)
Article Google Scholar
Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: Neural Information Processing Systems, pp. 121–128 (2008)
Google Scholar
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)
Article Google Scholar
Cheung, Y.M.: k*-means: A new generalized k-means clustering algorithm. Pattern Recognition Letters 24(15), 2883–2893 (2003)
Article MATH Google Scholar
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The transcriptional program of sporulation in budding yeast. Science 282(5389), 699–705 (1998)
Article Google Scholar
Datta, S., Datta, S.: Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics 7(S-4) (2006)
Google Scholar
Defays, D.: An efficient algorithm for a complete link method. The Computer Journal 20, 364–366 (1973)
Article MathSciNet Google Scholar
Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Article Google Scholar
D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23(12), 1499–1501 (2005)
Article Google Scholar
Einbond, L.S., Su, T., Wu, H.A., Friedman, R., Wang, X., Ramirez, A., Kronenberg, F., Weinstein, I.B.: The growth inhibitory effect of actein on human breast cancer cells is associated with activation of stress response pathways. International Journal of Cancer 121(9), 2073–2083 (2007)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (1996)
Google Scholar
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast detection of xml structural similarity. IEEE Transactions on Knowledge and Data Engineering 17(2), 160–175 (2005)
Article Google Scholar
Galluccio, L., Michel, O., Comon, P., Kliger, M., Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Information Sciences 251, 96–113 (2013)
Article MathSciNet Google Scholar
Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)
Article Google Scholar
Gaynor, S., Bair, E.: Identification of biologically relevant subtypes via preweighted sparse clustering. In: Biostatistics, pp. 1–33 (2013)
Google Scholar
Gollub, J., Sherlock, G.: Clustering microarray data. Methods in Enzymology 411, 194–213 (2006)
Article Google Scholar
Graham, K., De Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., Willers, H., Sebastiani, P., Rosenberg, C.L.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. British Journal of Cancer 102(8), 1284–1293 (2010)
Article Google Scholar
Gronau, I., Moran, S.: Optimal implementations of upgma and other common clustering algorithms. Technical report (2007)
Google Scholar
Guzzi, P.H., Cannataro, M.: mu-cs: An extension of the tm4 platform to manage affymetrix binary data. BMC Bioinformatics 11, 315 (2010)
Article Google Scholar
Guzzi, P.H., Di Martino, M.T., Tradigo, G., Veltri, P., Tassone, P., Tagliaferri, P., Cannataro, M.: Automatic summarisation and annotation of microarray data. Soft Computing 15(8), 1505–1512 (2011)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2000)
Google Scholar
Heard, N., Holmes, C., Stephens, D.: A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101(473), 18 (2006)
Article MATH MathSciNet Google Scholar
Heller, K.A., Ghahramani, Z.: Bayesian hierarchical clustering. In: International Conference on Machine Learning, pp. 297–304 (2005)
Google Scholar
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31(4), e15 (2003)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31 (September 1999)
Google Scholar
Jornsten, R., Yu, B.: Simultaneous gene clustering and subset selection for sample classification via mdl. Bioinformatics 19(9), 1100–1109 (2003)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2005)
Google Scholar
Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)
Article Google Scholar
Koschmieder, A., Zimmermann, K., Trißl, S., Stoltmann, T., Leser, U.: Tools for managing and analyzing microarray data. Briefings in Bioinformatics 13(1), 46–60 (2012)
Article Google Scholar
Lai, J.Z.C., Huang, T.J.: An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Information Sciences 181(9), 1722–1734 (2011)
Article Google Scholar
Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Masciari, E., Mazzeo, G.M., Zaniolo, C.: Analysing microarray expression data through effective clustering. Information Sciences 262, 32–45 (2014)
Article MathSciNet Google Scholar
Pizzuti, C., Rombo, S.E.: A coclustering approach for mining large protein-protein interaction networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(3), 717–730 (2012)
Article Google Scholar
Plumert, J.M.: Flexibility in children’s use of spatial and categorical organizational strategies. Recall Developmental Psychology 30(5), 738–747 (1994)
Article Google Scholar
Rasmussen, C., De La Cruz, B., Ghahramani, Z., Wild, D.L.: Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2007)
Google Scholar
Savage, R., Heller, K., Xu, Y., Ghahramani, Z., Truman, W., Grant, M., Denby, K., Wild, D.: R/bhc: Fast bayesian hierarchical clustering for microarray data. BMC Bioinformatics 10(1), 242 (2009)
Article Google Scholar
Sebastiani, P., Hui, X., Ramoni, M.: Bayesian analysis of comparative microarray experiments by model averaging. Bayesian Analysis 1(4), 707–732 (2006)
Article MathSciNet Google Scholar
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16, 30–34 (1973)
Article MathSciNet Google Scholar
Smyth, G.: limma: Linear models for microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health, ch. 23, pp. 397–420. Springer, New York (2005)
Google Scholar
Veenman, C.J., Reinders, M.J.T.: The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(9), 1417–1429 (2005)
Article Google Scholar
Wang, W., Yang, J., Muntz, R.R.: Sting: A statistical information grid approach to spatial data mining. In: Very Large Data Bases, pp. 186–195 (1997)
Google Scholar
Wang, W., Yang, J., Muntz, R.R.: An approach to active spatial data mining based on statistical information. IEEE Transactions on Knowledge and Data Engineering 12(5), 715–728 (2000)
Article Google Scholar
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR, Italy
Elio Masciari
UCLA, Italy
Giuseppe Massimiliano Mazzeo & Carlo Zaniolo
Magna Graecia University, Italy
Pietro Hiram Guzzi

Authors

Pietro Hiram Guzzi
View author publications
You can also search for this author in PubMed Google Scholar
Elio Masciari
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Massimiliano Mazzeo
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Zaniolo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Miroslav Bursa
Department of Computer Science, San Jose State University, One Washington Square, 95192-0249, San Jose, CA, USA
Sami Khuri
Istituto di Informatica e Telematica del CNR, Via G. Moruzzi 1, 56124, Pisa, Italy
M. Elena Renda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guzzi, P.H., Masciari, E., Mazzeo, G.M., Zaniolo, C. (2014). A Discussion on the Biological Relevance of Clustering Results. In: Bursa, M., Khuri, S., Renda, M.E. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2014. Lecture Notes in Computer Science, vol 8649. Springer, Cham. https://doi.org/10.1007/978-3-319-10265-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-10265-8_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10264-1
Online ISBN: 978-3-319-10265-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics