research-article

Weighted cluster ensembles: Methods and analysis

Authors:

Carlotta Domeniconi,

Muna Al-RazganAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 2, Issue 4

Article No.: 17, Pages 1 - 40

https://doi.org/10.1145/1460797.1460800

Published: 16 January 2009 Publication History

Abstract

Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.

References

[1]

Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.

[2]

Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.

[3]

Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175.

Digital Library

[4]

Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269--274.

Digital Library

[5]

Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D. 2007. Locally adaptive metrics for clustering high-dimensional data. Data Min. Knowl. Discov. J. 14, 1, 63--97.

Digital Library

[6]

Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.

[7]

Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.

[8]

Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.

[9]

Fern, X. and Brodley, C. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning. 281--288.

Digital Library

[10]

Fred, A. and Jain, A. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition. 276--280.

Digital Library

[11]

Fred, A. and Jain, A. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Analy. Mach. Intell. 27, 6, 835--850.

Digital Library

[12]

Gondek, D. and Hofmann, T. 2005. Non-redundant clustering with conditional ensembles. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70--77.

Digital Library

[13]

Greene, D., Tsymbal, A., Bolshakova, N., and Cunningham, P. 2004. Ensemble clustering in medical diagnostics. In Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. 576--581.

Digital Library

[14]

Hadjitodorov, S., Kuncheva, L., and Todorova, L. 2006. Moderate diversity for better cluster ensembles. Inform. Fusion 7, 3, 264--275.

Digital Library

[15]

Hu, X. 2004. Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. 251--258.

Digital Library

[16]

Kang, N., Domeniconi, C., and Barbara, D. 2005. Categorization and keyword identification of unlabeled documents. In Proceedings of the 5th IEEE International Conference on Data Mining. 677--680.

Digital Library

[17]

Karypis, G. and Kumar, V. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scient. Comput. 20, 1, 359--392.

Digital Library

[18]

Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.

[19]

Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.

[20]

Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.

[21]

Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.

[22]

Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.

[23]

Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.

[24]

Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high-dimensional data: a review. ACM SIGKDD Explor. Newslet. 6, 1, 90--105.

Digital Library

[25]

Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.

[26]

Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.

[27]

Strehl, A. and Ghosh, J. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Resea. 3, 3, 583--617.

Digital Library

[28]

Topchy, A., Jain, A., and Punch, W. 2003. Combining multiple weak clusterings. In Proceedings of the IEEE International Conference on Data Mining. 331--338.

Digital Library

[29]

Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.

[30]

Topchy, A., Jain, A., and Punch, W. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Patt. Anal. Mach. Intell. 27, 12, 1866--1881.

Digital Library

Cited By

Ji XSun JPeng JPang YZhou P(2025)Clustering Ensemble Based on Fuzzy Matrix Self-EnhancementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348955337:1(148-161)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3489553
Li SZhao PWang HWang HLi T(2025)Neighbor self-embedding graph model for clustering ensembleApplied Soft Computing10.1016/j.asoc.2025.112844171(112844)Online publication date: Mar-2025
https://doi.org/10.1016/j.asoc.2025.112844
Ren XYang Y(2025)Semi-supervised symmetric non-negative matrix factorization with graph quality improvement and constraintsApplied Intelligence10.1007/s10489-025-06282-y55:6Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10489-025-06282-y
Show More Cited By

Index Terms

Weighted cluster ensembles: Methods and analysis
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application ...
TWCC

A co-clustering method TWCC was proposed, in which two types of weights are automatically computed.Its the first two-way subspace weighting partitional co-clustering method.It can simultaneously weight data from two ways for co-clustering.Experimental ...
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 2, Issue 4

January 2009

154 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/1460797

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 January 2009

Accepted: 01 August 2008

Revised: 01 June 2008

Received: 01 August 2007

Published in TKDD Volume 2, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Information and Intelligent Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

146
Total Citations
View Citations
1,719
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ji XSun JPeng JPang YZhou P(2025)Clustering Ensemble Based on Fuzzy Matrix Self-EnhancementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348955337:1(148-161)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3489553
Li SZhao PWang HWang HLi T(2025)Neighbor self-embedding graph model for clustering ensembleApplied Soft Computing10.1016/j.asoc.2025.112844171(112844)Online publication date: Mar-2025
https://doi.org/10.1016/j.asoc.2025.112844
Ren XYang Y(2025)Semi-supervised symmetric non-negative matrix factorization with graph quality improvement and constraintsApplied Intelligence10.1007/s10489-025-06282-y55:6Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10489-025-06282-y
Hao ZLu ZLi GNie FWang RLi X(2024)Ensemble Clustering With Attentional RepresentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329257336:2(581-593)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TKDE.2023.3292573
Alziati MAmarù FMagri LArrigoni F(2024)Ensemble clustering via synchronized relabellingPattern Recognition Letters10.1016/j.patrec.2024.06.026184(176-182)Online publication date: Aug-2024
https://doi.org/10.1016/j.patrec.2024.06.026
Zhang ZChen XWang CWang RSong WNie F(2024)A Structured Bipartite Graph Learning method for ensemble clusteringPattern Recognition10.1016/j.patcog.2024.111133(111133)Online publication date: Nov-2024
https://doi.org/10.1016/j.patcog.2024.111133
Mahmud MHuang JGarcía S(2024)Clustering approximation via a fusion of multiple random samplesInformation Fusion10.1016/j.inffus.2023.101986101:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.101986
Lin HLiu HWu JLi HGünnemann S(2023)Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus ClusteringACM Transactions on Mathematical Software10.1145/361601149:4(1-27)Online publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1145/3616011
He GJiang WPeng RYin MHan M(2023)Soft Subspace Based Ensemble Clustering for Multivariate Time Series DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314613634:10(7761-7774)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3146136
Mahmud MHuang JRuby RNgueilbaye AWu K(2023)Approximate Clustering Ensemble Method for Big DataIEEE Transactions on Big Data10.1109/TBDATA.2023.32550039:4(1142-1155)Online publication date: 1-Aug-2023
https://doi.org/10.1109/TBDATA.2023.3255003
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents