On combining multiple clusterings: an overview and a new perspective

Li, Tao; Ogihara, Mitsunori; Ma, Sheng

doi:10.1007/s10489-009-0160-4

On combining multiple clusterings: an overview and a new perspective

Published: 12 February 2009

Volume 33, pages 207–219, (2010)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tao Li¹,
Mitsunori Ogihara² &
Sheng Ma³

283 Accesses
22 Citations
Explore all metrics

Abstract

Many problems can be reduced to the problem of combining multiple clusterings. In this paper, we first summarize different application scenarios of combining multiple clusterings and provide a new perspective of viewing the problem as a categorical clustering problem. We then show the connections between various consensus and clustering criteria and discuss the complexity results of the problem. Finally we propose a new method to determine the final clustering. Experiments on kinship terms and clustering popular music from heterogeneous feature sets show the effectiveness of combining multiple clusterings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Proceedings of 2006 SIAM international conference on data mining (SDM 2006)
Alhajj R, Kaya M (2008) Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining. J Intell Inf Syst 31:243–264
Article Google Scholar
Fred ALN, Jain A (2003) Robust data clustering. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition
Arabie P, Carroll JD, Desarbo W (1987) Three-way scaling and clustering. Sage, Thousand Oaks
Google Scholar
Argamon S, Saric M, Stein SS (2003) Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 475–480
Chapter Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36:105–139
Article Google Scholar
Bill E (1994) Some advances in transformation-based parts of speech tagging. In: Proceedings of the twelfth national conference on artificial intelligence, vol. 1. American Association for Artificial Intelligence, Menlo Park, pp 722–727
Google Scholar
Brucker P (1977) On the complexity of clustering problems. In: Optimization and operations research. Springer, New York, pp 45–54
Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46
Article Google Scholar
David A, Panchanathan S (2000) Wavelet-histogram method for face recognition. J Electron Imaging 9:217–225
Article Google Scholar
Day WHE (1986) Foreword: Comparison and consensus of classifications. J Classif 3:183–185
Article Google Scholar
de Souto MCP, de Araujo DSA, da Silva BL (2006) Cluster ensemble for gene expression microarray data: accuracy and diversity. In: Proceedings of the 2006 international joint conference on neural networks
Duran BS, Odell PL (1974) Cluster analysis: a survey. Springer, New York
MATH Google Scholar
Everitt BS (1987) Introduction to optimization methods and their application in statistics. Chapman and Hall, London
MATH Google Scholar
Ferligoj A (1992) Direct multicriteria clustering algorithm. J Classif 9:43–61
Article MATH MathSciNet Google Scholar
Ferligoj A, Batagelj V (1983) Some types of clustering with relational constraints. Psychometrika 48:541–552
Article MATH MathSciNet Google Scholar
Fern X, Lin W (2008) Cluster ensemble selection. In: Proceedings of 2008 SIAM international conference on data mining (SDM 2008)
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the twentieth international conference on machine learning (ICML 2003). Morgan Kaufmann, San Mateo, pp 186–193
Google Scholar
Filkov V, Skiena S (2004) Integrating microarray data by consensus clustering. Int J Artif Intell Tools, pp 863–880
Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: ICDE, pp 341–352
Golub GH, Loan CFV (1991) Matrix computations. The Johns Hopkins University Press, Baltimore
Google Scholar
Goodman LA, Kruskal WH (1954) Measures of associations for cross classification. J Am Stat Assoc 49:732–764
Article MATH Google Scholar
Gordan AD, Vichi M (1998) Partitions of partitions. J Classif 15:265–285
Article Google Scholar
Gordan AD, Vichi M (2002) Obtaining partitions of a set of hard or fuzzy partitions. Classification, clustering and data analysis: recent advances and applications. Springer, Berlin, pp 75–79
Google Scholar
Gyllenberg M, Koski T, Verlaan M (1997) Classification of binary vectors by stochastic complexity. J Multivar Anal 63:47–72
Article MATH MathSciNet Google Scholar
H J, Knowles J (2004) Evolutionary multiobjective clustering. In: Proceedings of the eighth international conference on parallel problem solving from nature. Springer, New York, pp 1081–1091
Google Scholar
Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inform Fus 7:264–275
Article Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17:107–145
Article MATH Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
MATH Google Scholar
Hu X, Yoo I, Zhang X, Nanavati P, Das D (2006) Wavelet transformation and cluster ensemble for gene expression analysis. Int J Bioinform Res Appl 1:447–460
Article Google Scholar
Hubert LJ, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Hubert LJ, Baker FB (1978) Evaluating the conformity of sociometric measurements. Psychometrika 43:31–41
Article MathSciNet Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New York
MATH Google Scholar
Kargupta H, Huang W, Sivakumar K, Johnson EL (2001) Distributed clustering using collective principal component analysis. Knowl Inf Syst 3:422–448
Article MATH Google Scholar
Katz L, Powell JH (1953) A proposed index of the conformity of one sociometric measurement to another. Psychometrika 18:249–256
Article MATH Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Google Scholar
Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 424–430
Li T (2005) A general model for clustering binary data. In: KDD’05: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 188–197
Li T (2006) A unified view on clustering binary data. Mach Learn 62:199–215
Article Google Scholar
Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of 2008 SIAM international conference on data mining (SDM 2008)
Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of 2007 IEEE international conference on data mining (ICDM 2007)
Li T, Ma S (2004) IFD: iterative feature and data clustering. In: Proceedings of the 2004 SIAM international conference on data mining (SDM 2004). SIAM, Philadelphia
Google Scholar
Li T, Ma S, Ogihara M (2004a) Document clustering via adaptive subspace iteration. In: Proceedings of twenty-seventh annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004), pp 218–225
Li T, Ogihara M (2004) Music artist style identification by semisupervised learning from both lyrics and content. In: Proceedings of the ACM conference on multimedia
Li T, Ogihara M, Li Q (2003a) A comparative study on content-based music genre classification. In: SIGIR’03. ACM, New York, pp 282–289
Google Scholar
Li T, Ogihara M, Ma S (2004b) On combining multiple clusterings. In: CIKM, pp 294–303
Li T, Zhu S, Ogihara M (2003b) Algorithms for clustering high dimensional and distributed data. Intell Data Anal J 7:305–326
MATH Google Scholar
Matake N, Hiroyasu T, Miki M, Senda T (2007) Multiobjective clustering with automatic k-determination for large-scale data. In: GECCO’07: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 861–868
Chapter Google Scholar
Meila M (2003) Comparing clusterings by the variation of information. In: Proceedings of learning theory and kernel machines: 16th annual conference on learning theory and 7th kernel workshop, COLT/Kernel 2003. Springer, Berlin, pp 173–187
Google Scholar
Messatfa H (1992) An algorithm to maximize the agreement. J Classif 9:5–15
Article MATH MathSciNet Google Scholar
Mirkin B (20001) Reinterpreting the category utility function. Mach Learn 45:219–228
Article Google Scholar
Mitton R (1987) Spelling checkers, spelling correctors and the misspellings of poor spellers. Inf Process Manag 23:103–209
Article Google Scholar
Monti S, Tamayo P, Mesirov J, Gloub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn J 52:91–118
Article MATH Google Scholar
Moret BM (1998) The theory of computation. Addison-Wesley, Reading
MATH Google Scholar
Ozyer T, Alhajj R (2008a) Deciding on number of clusters by multi-objective optimization and validity analysis. J Multi-Valued Log Soft Comput 14:457–474
MathSciNet Google Scholar
Ozyer T, Alhajj R (2008b) Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer. Appl Intell, to appear, 2009
Ellis PWD, Whitman B, Berenzweig A, Lawrence S (2002) The quest for ground truth in musical artist similarity. In: Proceedings of 3rd international conference on music information retrieval, pp 170–177
Rosenberg S, Kim MP (1975) The method of sorting as a data gathering procedure in multivariate research. Multivar Behav Res 10:489–502
Article Google Scholar
Stamatatos E, Fakotakis N, Kokkinakis G (2000) Automatic text categorization in terms of genre and author. Comput Linguist 26:471–496
Article Google Scholar
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Lear Res 3:583–617
Article MATH MathSciNet Google Scholar
Tweedie FJ, Baayen RH (1998) How variable may a constant be? Measure of lexical richness in perspective. Comput Humanit 32:323–352
Article Google Scholar
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10
Vichi M (1999) One-mode classification of a three-way data matrix. J Classif 16:27–44
Article MATH Google Scholar
Zhao Y, Karypis G (2001) Criterion functions for document clustering: Experiments and analysis. Technical Report, Department of Computer Science, University of Minnesota

Download references

Author information

Authors and Affiliations

School of Computer Science, Florida International University, 11200 SW 8th Street, Miami, FL, 33199, USA
Tao Li
Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33146, USA
Mitsunori Ogihara
Machine Learning for Systems, IBM T.J. Watson Research Center, 17 Skyline Drive, Hawthorne, NY, 10532, USA
Sheng Ma

Authors

Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Mitsunori Ogihara
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, T., Ogihara, M. & Ma, S. On combining multiple clusterings: an overview and a new perspective. Appl Intell 33, 207–219 (2010). https://doi.org/10.1007/s10489-009-0160-4

Download citation

Received: 09 August 2008
Accepted: 22 December 2008
Published: 12 February 2009
Issue Date: October 2010
DOI: https://doi.org/10.1007/s10489-009-0160-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On combining multiple clusterings: an overview and a new perspective

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

An overview of consensus models for group decision-making and group recommender systems

A Guide for Sparse PCA: Model Comparison and Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On combining multiple clusterings: an overview and a new perspective

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

An overview of consensus models for group decision-making and group recommender systems

A Guide for Sparse PCA: Model Comparison and Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation