Skip to main content

Effectivity of Internal Validation Techniques for Gene Clustering

  • Conference paper
Biological and Medical Data Analysis (ISBMDA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4345))

Included in the following conference series:

Abstract

Clustering is a major exploratory technique for gene expression data in post-genomic era. As essential tools within cluster analysis, cluster validation techniques have the potential to assess the quality of clustering results and performance of clustering algorithms, helpful to the interpretation of clustering results. In this work, the validation ability of Silhouette index, Dunn’s index, Davies-Bouldin index and FOM in gene clustering was investigated with public gene expression datasets clustered by hierarchical single-linkage and average-linkage clustering, K-means and SOMs. It was made clear that Silhouette index and FOM can preferably validate the performance of clustering algorithms and the quality of clustering results, Dunn’s index should not be used directly in gene clustering validation for its high susceptibility to outliers, while Davies- Bouldin index can afford better validation than Dunn’s index, exception for its preference to hierarchical single-linkage clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)

    Article  Google Scholar 

  2. Amir, B., Friedman, N., Yakhini, Z.: Class discovery in gene expression data. In: RECOMB, pp. 31–38 (2001)

    Google Scholar 

  3. Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)

    Article  Google Scholar 

  4. Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature Genetics 32, 502–508 (2002)

    Article  Google Scholar 

  5. Sherlock, G.: Analysis of large-scale gene expression data. Current Opinion in Immunology 12, 201–205 (2000)

    Article  Google Scholar 

  6. Datta, S., Datta, S.: Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003)

    Article  Google Scholar 

  7. Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001)

    Article  Google Scholar 

  8. Eisen, M.B., Spellman, P.T., Brown, P.O., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  9. Halkidi, M.: On clustering validation techniques. J. Intell. Inform. Syst. 17, 107–145 (2001)

    Article  MATH  Google Scholar 

  10. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)

    Article  Google Scholar 

  11. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83, 825–833 (2003)

    Article  MATH  Google Scholar 

  12. Ji, X.L., Li, L.J., Sun, Z.R.: Mining gene expression data using a novel approach based on hidden Markov models. FEBS Letters 542, 125–131 (2003)

    Article  Google Scholar 

  13. Bolshakova, N., Azuaje, F.: Improving expression data mining through cluster validation. In: Proc. of the 4th Annual IEEE conf. on Information Technology Application in Biomedicine, pp. 19–22 (2003)

    Google Scholar 

  14. Chu, S., DeRisi, J., Eisen, M., et al.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)

    Article  Google Scholar 

  15. Cho, R.J., Campbell, M.J., Winzeler, E.A., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)

    Article  Google Scholar 

  16. Tavazoie, S., Huges, J.D., Campbell, M.J., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)

    Article  Google Scholar 

  17. Wen, X.L., Fuhrman, S., Michaels, G.S., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339 (1998)

    Article  Google Scholar 

  18. Ideker, T., Thorsson, V., Ranish, J.A., et al.: Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 292, 929–934 (2001)

    Article  Google Scholar 

  19. Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene expression data with repeated measurements. Genome Biology 4, R34 (2003)

    Article  Google Scholar 

  20. Iyer, V.R., Eisen, M.B., Ross, D.T., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)

    Article  Google Scholar 

  21. Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18, 536–545 (2002)

    Article  Google Scholar 

  22. Yang, C.M., Wan, B.K., Gao, X.F.: Selections of data preprocessing methods and similarity metrics for gene cluster analysis. Progress in Nature Science 16, 607–713 (2006)

    Article  Google Scholar 

  23. Yang, C.M., Wan, B.K., Gao, X.F.: Data preprocessing in cluster analysis of gene expression. Chin. Phys. Lett. 20, 774–777 (2003)

    Article  Google Scholar 

  24. Rousseuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)

    Article  Google Scholar 

  25. Bezdek, J.C., Nikhil, R.P.: Some new indexes of cluster validity. IEEE Transactions on systems, man, and cybernetics 28, 301–315 (1998)

    Article  Google Scholar 

  26. Azuaje, F.: A cluster validity framework for genome expression data. Bioinformatics 18, 319–320 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, C., Wan, B., Gao, X. (2006). Effectivity of Internal Validation Techniques for Gene Clustering. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_5

Download citation

  • DOI: https://doi.org/10.1007/11946465_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68063-5

  • Online ISBN: 978-3-540-68065-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics