Abstract
There exist numerous algorithms that cluster data-points from large-scale genomic experiments such as sequencing, gene-expression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from large-scale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easy-to-use and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for Comparative-Package-for-Clustering-Assessment. COMPACT first reduces the dataset’s dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on high-dimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to large-scale gene-expression experiments. COMPACT is configurable and expendable to newly added algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, vol. 95(25), pp. 14863–14868 (1998)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI, Menlo Park (2000)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.P.T.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9(12), 3273–3297 (1998)
Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 88(1), 018702 (2002)
Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R.M., Angelo, M., Reich, M., Lander, E., Mesirov, J., Golub, T.C.H., Ramaswamy, S.: Molecular classification of multiple tumor types. Bioinformatics, 17 (Suppl. 1) S316–S322 (2001)
Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)
Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.S.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–42 (2003)
Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. In: Proc. Natl. Acad. Sci. USA, vol. 97, pp. 10101-10106 (2000)
Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19(9), 1110–1115 (2003)
Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–20 (2000)
Sasson, O., Linial, N., Linial, M.: The metric space of proteins-comparative study of clustering algorithms. Bioinformatics, 18 (Suppl. 1) S14–S21 (2002)
Sasson, O., Vaaknin, A., Fleischer, H., Portugaly, E., Bilu, Y., Linial, N., Linial, M.: ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res. 31(1), 348–52 (2003)
The Eisen Lab software page, http://rana.lbl.gov/EisenSoftware.htm
The R project for statistical computing, http://www.r-project.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Varshavsky, R., Linial, M., Horn, D. (2005). COMPACT: A Comparative Package for Clustering Assessment. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds) Parallel and Distributed Processing and Applications - ISPA 2005 Workshops. ISPA 2005. Lecture Notes in Computer Science, vol 3759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576259_18
Download citation
DOI: https://doi.org/10.1007/11576259_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29770-3
Online ISBN: 978-3-540-32115-6
eBook Packages: Computer ScienceComputer Science (R0)