Abstract
The consensus clustering technique combines multiple clustering results without accessing the original data. Consensus clustering can be used to improve the robustness of clustering results or to obtain the clustering results from multiple data sources. In this paper, we propose a novel definition of the similarity between points and clusters. With an iterative process, such a definition of similarity can represent how a point should join or leave a cluster clearly, determine the number of clusters automatically, and combine partially overlapping clustering results. We also incorporate the concept of “clustering fragment” into our method for increased speed. The experimental results show that our algorithm achieves good performances on both artificial data and real data.




Similar content being viewed by others
References
Al-Razgan M, Domeniconi C, Barbará D (2008) Random subspace ensembles for clustering categorical data. In: Al-Razgan M, Domeniconi C, Barbará D (eds) Supervised and unsupervised ensemble methods and their applications. Springer, Berlin/Heidelberg, pp 31–48
Borah B, Bhattacharyya DK (2008) DDSC: a density differentiated spatial clustering technique. J Comput 3(2):72–79
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the international conference on machine learning, vol 20(1), p 186
Fred AL, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the 16th international conference on, pattern recognition, pp 276–280
Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: Proceedings of the international conference on data, engineering, pp 341–352
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: ALENEX’08: proceedings 10th workshop on algorithm engineering and experiments, pp 109–117
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Iam-On N, Boongeon T, Garrett S, Price C (2012) A link-based cluster ensemble approach for categorical data clustering. IEEE Trans Knowl Data Eng 24(3):413–425
Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129
Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies 1. Hierarchical systems. Comput J 9(4):373–380
Meilă M (2003) Comparing clusterings by the variation of information. In: Meilă M (ed) Learning theory and kernel machines. Springer, Berlin/Heidelberg, pp 173–187
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
Nguyen N, Caruana R (2007) Consensus clusterings. In: ICDM’07: proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, 607–612
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Proceedings of IEEE international conference on data mining, pp 331–338
Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: Proceedings of SIAM conference on data mining, pp 379–390
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Verma D, Meila M (2005) A comparison of spectral clustering algorithms. Technical report, Department of CSE University of Washington Seattle, WA 98195–2350
Wu O, Zhu M, Hu W (2009) Fragment-based clustering ensembles. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1795–1798
Wu O, Hu W, Maybank SJ, Zhu M, Li B (2012) Efficient clustering aggregation based on data fragments. IEEE Trans Syst Man Cybern Part B Cybern 42(3):913–926
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chung, CH., Dai, BR. A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowl Inf Syst 41, 591–609 (2014). https://doi.org/10.1007/s10115-013-0667-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0667-1