Skip to main content
Log in

Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A measurement of cluster quality is often needed for DNA microarray data analysis. In this paper, we introduce a new cluster validity index, which measures geometrical features of the data. The essential concept of this index is to evaluate the ratio between the squared total length of the data eigen-axes with respect to the between-cluster separation. We show that this cluster validity index works well for data that contain clusters closely distributed or with different sizes. We verify the method using three simulated data sets, two real world data sets and two microarray data sets. The experiment results show that the proposed index is superior to five other cluster validity indices, including partition coefficients (PC), General silhouette index (GS), Dunn’s index (DI), CH Index and I-Index. Also, we have given a theorem to show for what situations the proposed index works well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control 19:716–723

    Article  MATH  MathSciNet  Google Scholar 

  2. Bezdek J (1974) Mathematical taxonomy with fuzzy sets. J Math Biol, 1:57–71

    Article  MATH  MathSciNet  Google Scholar 

  3. Cho R, Campbell M, Winzeler E, Steinmetz L, Conway A, Wodicka L, Wolfsberg T, Gabrielian A, Landsman D, Lockhart D, Davis R (1998) Genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–78

    Article  Google Scholar 

  4. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown P, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699–705

    Article  Google Scholar 

  5. Dubes R, Jain A (1979) Validity studies in clustering methodologies. Pattern Recognit 11:235–254

    Article  MATH  Google Scholar 

  6. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  7. Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River, NJ

    MATH  Google Scholar 

  8. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis.Wiley, New York

    Google Scholar 

  9. Lam B, Yan H (2005) A new cluster validity index for data with merged clusters and different densities. In: IEEE international. conf. on systems, man and cybernetics (to appear)

  10. Lam B, Yan H (2005) Cluster validity for DNA microarray data using a geometrical index. In: Proceedings of the. International. Conference. Machine learning and cybernetics, pp 3333–3339

  11. Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell, 24(12): 1650–1654

    Article  Google Scholar 

  12. Milligan G, Cooper C (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179

    Article  Google Scholar 

  13. Qi Y, Xu S (2004) Supervised cluster analysis for microarray data based on multivariate Gaussian mixture. Bioinformatics 20(12): 1905–1913

    Article  Google Scholar 

  14. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math, 20: 53–65

    Article  MATH  Google Scholar 

  15. Schwartz G (1978) Estimating the dimension of a model. Ann Stati 6: 461–464

    Google Scholar 

  16. Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genet 22: 218–285

    Google Scholar 

  17. Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W (2001) Mode-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987

    Article  Google Scholar 

  18. Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for gene expression data. Bioinformatics 17(4):309–318

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benson S. Y. Lam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lam, B.S.Y., Yan, H. Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity. Soft Comput 11, 341–348 (2007). https://doi.org/10.1007/s00500-006-0087-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-006-0087-1

Keywords

Navigation