Skip to main content

COMPACT: A Comparative Package for Clustering Assessment

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3759))

Abstract

There exist numerous algorithms that cluster data-points from large-scale genomic experiments such as sequencing, gene-expression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from large-scale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easy-to-use and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for Comparative-Package-for-Clustering-Assessment. COMPACT first reduces the dataset’s dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on high-dimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to large-scale gene-expression experiments. COMPACT is configurable and expendable to newly added algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  2. Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)

    Google Scholar 

  3. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, vol. 95(25), pp. 14863–14868 (1998)

    Google Scholar 

  4. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  5. Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI, Menlo Park (2000)

    Google Scholar 

  6. Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.P.T.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9(12), 3273–3297 (1998)

    Google Scholar 

  7. Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 88(1), 018702 (2002)

    Google Scholar 

  8. Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R.M., Angelo, M., Reich, M., Lander, E., Mesirov, J., Golub, T.C.H., Ramaswamy, S.: Molecular classification of multiple tumor types. Bioinformatics, 17 (Suppl. 1) S316–S322 (2001)

    Google Scholar 

  9. Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)

    Google Scholar 

  10. Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.S.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–42 (2003)

    Google Scholar 

  11. Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)

    Google Scholar 

  12. Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. In: Proc. Natl. Acad. Sci. USA, vol. 97, pp. 10101-10106 (2000)

    Google Scholar 

  13. Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19(9), 1110–1115 (2003)

    Google Scholar 

  14. Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–20 (2000)

    Google Scholar 

  15. Sasson, O., Linial, N., Linial, M.: The metric space of proteins-comparative study of clustering algorithms. Bioinformatics, 18 (Suppl. 1) S14–S21 (2002)

    Google Scholar 

  16. Sasson, O., Vaaknin, A., Fleischer, H., Portugaly, E., Bilu, Y., Linial, N., Linial, M.: ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res. 31(1), 348–52 (2003)

    Google Scholar 

  17. The Eisen Lab software page, http://rana.lbl.gov/EisenSoftware.htm

  18. The R project for statistical computing, http://www.r-project.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Varshavsky, R., Linial, M., Horn, D. (2005). COMPACT: A Comparative Package for Clustering Assessment. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds) Parallel and Distributed Processing and Applications - ISPA 2005 Workshops. ISPA 2005. Lecture Notes in Computer Science, vol 3759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576259_18

Download citation

  • DOI: https://doi.org/10.1007/11576259_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29770-3

  • Online ISBN: 978-3-540-32115-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics