Skip to main content

A Discussion on the Biological Relevance of Clustering Results

  • Conference paper
Information Technology in Bio- and Medical Informatics (ITBAM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8649))

Abstract

The recent advances in genomic technologies and the availability of large-scale datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. Indeed, a key feature that lacks in many proposed approach is the biological interpretation of the obtained results. In this paper, we will discuss such an issue by analysing the results obtained by several clustering algorithms w.r.t. their biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahn, J., Yoon, Y., Park, S.: Noise-robust algorithm for identifying functionally associated biclusters from gene expression data. Information Sciences 181(3), 435–449 (2011)

    Article  Google Scholar 

  2. Arnau, V., Mars, S., Marín, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), 364–378 (2005)

    Article  Google Scholar 

  3. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  4. Au, W.-H., Chan, K.C.C., Wong, A.K.C., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)

    Article  Google Scholar 

  5. Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23(21), 2859–2865 (2007)

    Article  Google Scholar 

  6. Bar-Joseph, Z., Demaine, E.D., Gifford, D.K., Srebro, N., Hamel, A.M., Jaakkola, T.: K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–1078 (2003)

    Article  Google Scholar 

  7. Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: Neural Information Processing Systems, pp. 121–128 (2008)

    Google Scholar 

  8. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)

    Article  Google Scholar 

  9. Cheung, Y.M.: k*-means: A new generalized k-means clustering algorithm. Pattern Recognition Letters 24(15), 2883–2893 (2003)

    Article  MATH  Google Scholar 

  10. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The transcriptional program of sporulation in budding yeast. Science 282(5389), 699–705 (1998)

    Article  Google Scholar 

  11. Datta, S., Datta, S.: Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics 7(S-4) (2006)

    Google Scholar 

  12. Defays, D.: An efficient algorithm for a complete link method. The Computer Journal 20, 364–366 (1973)

    Article  MathSciNet  Google Scholar 

  13. Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)

    Article  Google Scholar 

  14. D’haeseleer, P.: How does gene expression clustering work? Nature Biotechnology 23(12), 1499–1501 (2005)

    Article  Google Scholar 

  15. Einbond, L.S., Su, T., Wu, H.A., Friedman, R., Wang, X., Ramirez, A., Kronenberg, F., Weinstein, I.B.: The growth inhibitory effect of actein on human breast cancer cells is associated with activation of stress response pathways. International Journal of Cancer 121(9), 2073–2083 (2007)

    Article  Google Scholar 

  16. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  17. Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast detection of xml structural similarity. IEEE Transactions on Knowledge and Data Engineering 17(2), 160–175 (2005)

    Article  Google Scholar 

  18. Galluccio, L., Michel, O., Comon, P., Kliger, M., Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Information Sciences 251, 96–113 (2013)

    Article  MathSciNet  Google Scholar 

  19. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)

    Article  Google Scholar 

  20. Gaynor, S., Bair, E.: Identification of biologically relevant subtypes via preweighted sparse clustering. In: Biostatistics, pp. 1–33 (2013)

    Google Scholar 

  21. Gollub, J., Sherlock, G.: Clustering microarray data. Methods in Enzymology 411, 194–213 (2006)

    Article  Google Scholar 

  22. Graham, K., De Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., Willers, H., Sebastiani, P., Rosenberg, C.L.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. British Journal of Cancer 102(8), 1284–1293 (2010)

    Article  Google Scholar 

  23. Gronau, I., Moran, S.: Optimal implementations of upgma and other common clustering algorithms. Technical report (2007)

    Google Scholar 

  24. Guzzi, P.H., Cannataro, M.: mu-cs: An extension of the tm4 platform to manage affymetrix binary data. BMC Bioinformatics 11, 315 (2010)

    Article  Google Scholar 

  25. Guzzi, P.H., Di Martino, M.T., Tradigo, G., Veltri, P., Tassone, P., Tagliaferri, P., Cannataro, M.: Automatic summarisation and annotation of microarray data. Soft Computing 15(8), 1505–1512 (2011)

    Article  Google Scholar 

  26. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2000)

    Google Scholar 

  27. Heard, N., Holmes, C., Stephens, D.: A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101(473), 18 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  28. Heller, K.A., Ghahramani, Z.: Bayesian hierarchical clustering. In: International Conference on Machine Learning, pp. 297–304 (2005)

    Google Scholar 

  29. Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31(4), e15 (2003)

    Google Scholar 

  30. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31 (September 1999)

    Google Scholar 

  31. Jornsten, R., Yu, B.: Simultaneous gene clustering and subset selection for sample classification via mdl. Bioinformatics 19(9), 1100–1109 (2003)

    Article  Google Scholar 

  32. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2005)

    Google Scholar 

  33. Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)

    Article  Google Scholar 

  34. Koschmieder, A., Zimmermann, K., Trißl, S., Stoltmann, T., Leser, U.: Tools for managing and analyzing microarray data. Briefings in Bioinformatics 13(1), 46–60 (2012)

    Article  Google Scholar 

  35. Lai, J.Z.C., Huang, T.J.: An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Information Sciences 181(9), 1722–1734 (2011)

    Article  Google Scholar 

  36. Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)

    Article  Google Scholar 

  37. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  38. Masciari, E., Mazzeo, G.M., Zaniolo, C.: Analysing microarray expression data through effective clustering. Information Sciences 262, 32–45 (2014)

    Article  MathSciNet  Google Scholar 

  39. Pizzuti, C., Rombo, S.E.: A coclustering approach for mining large protein-protein interaction networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(3), 717–730 (2012)

    Article  Google Scholar 

  40. Plumert, J.M.: Flexibility in children’s use of spatial and categorical organizational strategies. Recall Developmental Psychology 30(5), 738–747 (1994)

    Article  Google Scholar 

  41. Rasmussen, C., De La Cruz, B., Ghahramani, Z., Wild, D.L.: Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2007)

    Google Scholar 

  42. Savage, R., Heller, K., Xu, Y., Ghahramani, Z., Truman, W., Grant, M., Denby, K., Wild, D.: R/bhc: Fast bayesian hierarchical clustering for microarray data. BMC Bioinformatics 10(1), 242 (2009)

    Article  Google Scholar 

  43. Sebastiani, P., Hui, X., Ramoni, M.: Bayesian analysis of comparative microarray experiments by model averaging. Bayesian Analysis 1(4), 707–732 (2006)

    Article  MathSciNet  Google Scholar 

  44. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16, 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  45. Smyth, G.: limma: Linear models for microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health, ch. 23, pp. 397–420. Springer, New York (2005)

    Google Scholar 

  46. Veenman, C.J., Reinders, M.J.T.: The nearest subclass classifier: A compromise between the nearest mean and nearest neighbor classifier. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(9), 1417–1429 (2005)

    Article  Google Scholar 

  47. Wang, W., Yang, J., Muntz, R.R.: Sting: A statistical information grid approach to spatial data mining. In: Very Large Data Bases, pp. 186–195 (1997)

    Google Scholar 

  48. Wang, W., Yang, J., Muntz, R.R.: An approach to active spatial data mining based on statistical information. IEEE Transactions on Knowledge and Data Engineering 12(5), 715–728 (2000)

    Article  Google Scholar 

  49. Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)

    Article  Google Scholar 

  50. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Guzzi, P.H., Masciari, E., Mazzeo, G.M., Zaniolo, C. (2014). A Discussion on the Biological Relevance of Clustering Results. In: Bursa, M., Khuri, S., Renda, M.E. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2014. Lecture Notes in Computer Science, vol 8649. Springer, Cham. https://doi.org/10.1007/978-3-319-10265-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10265-8_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10264-1

  • Online ISBN: 978-3-319-10265-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics