skip to main content
research-article

Weighted cluster ensembles: Methods and analysis

Published: 16 January 2009 Publication History

Abstract

Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.

References

[1]
Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.
[2]
Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.
[3]
Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175.
[4]
Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269--274.
[5]
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D. 2007. Locally adaptive metrics for clustering high-dimensional data. Data Min. Knowl. Discov. J. 14, 1, 63--97.
[6]
Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.
[7]
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.
[8]
Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.
[9]
Fern, X. and Brodley, C. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. 281--288.
[10]
Fred, A. and Jain, A. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition. 276--280.
[11]
Fred, A. and Jain, A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Analy. Mach. Intell. 27, 6, 835--850.
[12]
Gondek, D. and Hofmann, T. 2005. Non-redundant clustering with conditional ensembles. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70--77.
[13]
Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. 2004. Ensemble clustering in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. 576--581.
[14]
Hadjitodorov, S., Kuncheva, L., and Todorova, L. 2006. Moderate diversity for better cluster ensembles. Inform. Fusion 7, 3, 264--275.
[15]
Hu, X. 2004. Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. 251--258.
[16]
Kang, N., Domeniconi, C., and Barbara, D. 2005. Categorization and keyword identification of unlabeled documents. In Proceedings of the 5th IEEE International Conference on Data Mining. 677--680.
[17]
Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 1, 359--392.
[18]
Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.
[19]
Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.
[20]
Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.
[21]
Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.
[22]
Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.
[23]
Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.
[24]
Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high-dimensional data: a review. ACM SIGKDD Explor. Newslet. 6, 1, 90--105.
[25]
Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.
[26]
Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.
[27]
Strehl, A. and Ghosh, J. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Resea. 3, 3, 583--617.
[28]
Topchy, A., Jain, A., and Punch, W. 2003. Combining multiple weak clusterings. In Proceedings of the IEEE International Conference on Data Mining. 331--338.
[29]
Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.
[30]
Topchy, A., Jain, A., and Punch, W. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Patt. Anal. Mach. Intell. 27, 12, 1866--1881.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 2, Issue 4
January 2009
154 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1460797
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 January 2009
Accepted: 01 August 2008
Revised: 01 June 2008
Received: 01 August 2007
Published in TKDD Volume 2, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cluster ensembles
  2. accuracy and diversity measures
  3. consensus functions
  4. data mining
  5. subspace clustering
  6. text data

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Clustering Ensemble Based on Fuzzy Matrix Self-EnhancementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348955337:1(148-161)Online publication date: 1-Jan-2025
  • (2025)Neighbor self-embedding graph model for clustering ensembleApplied Soft Computing10.1016/j.asoc.2025.112844171(112844)Online publication date: Mar-2025
  • (2025)Semi-supervised symmetric non-negative matrix factorization with graph quality improvement and constraintsApplied Intelligence10.1007/s10489-025-06282-y55:6Online publication date: 1-Apr-2025
  • (2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
  • (2024)Ensemble clustering via synchronized relabellingPattern Recognition Letters10.1016/j.patrec.2024.06.026184(176-182)Online publication date: Aug-2024
  • (2024)A Structured Bipartite Graph Learning method for ensemble clusteringPattern Recognition10.1016/j.patcog.2024.111133(111133)Online publication date: Nov-2024
  • (2024)Clustering approximation via a fusion of multiple random samplesInformation Fusion10.1016/j.inffus.2023.101986101:COnline publication date: 1-Jan-2024
  • (2023)Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus ClusteringACM Transactions on Mathematical Software10.1145/361601149:4(1-27)Online publication date: 15-Dec-2023
  • (2023)Soft Subspace Based Ensemble Clustering for Multivariate Time Series DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314613634:10(7761-7774)Online publication date: Oct-2023
  • (2023)Approximate Clustering Ensemble Method for Big DataIEEE Transactions on Big Data10.1109/TBDATA.2023.32550039:4(1142-1155)Online publication date: 1-Aug-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media