Skip to main content
Log in

A fragment-based iterative consensus clustering algorithm with a robust similarity

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The consensus clustering technique combines multiple clustering results without accessing the original data. Consensus clustering can be used to improve the robustness of clustering results or to obtain the clustering results from multiple data sources. In this paper, we propose a novel definition of the similarity between points and clusters. With an iterative process, such a definition of similarity can represent how a point should join or leave a cluster clearly, determine the number of clusters automatically, and combine partially overlapping clustering results. We also incorporate the concept of “clustering fragment” into our method for increased speed. The experimental results show that our algorithm achieves good performances on both artificial data and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Al-Razgan M, Domeniconi C, Barbará D (2008) Random subspace ensembles for clustering categorical data. In: Al-Razgan M, Domeniconi C, Barbará D (eds) Supervised and unsupervised ensemble methods and their applications. Springer, Berlin/Heidelberg, pp 31–48

  2. Borah B, Bhattacharyya DK (2008) DDSC: a density differentiated spatial clustering technique. J Comput 3(2):72–79

    Article  Google Scholar 

  3. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining

  4. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the international conference on machine learning, vol 20(1), p 186

  5. Fred AL, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the 16th international conference on, pattern recognition, pp 276–280

  6. Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: Proceedings of the international conference on data, engineering, pp 341–352

  7. Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: ALENEX’08: proceedings 10th workshop on algorithm engineering and experiments, pp 109–117

  8. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  9. Iam-On N, Boongeon T, Garrett S, Price C (2012) A link-based cluster ensemble approach for categorical data clustering. IEEE Trans Knowl Data Eng 24(3):413–425

    Article  Google Scholar 

  10. Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129

    Google Scholar 

  11. Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering using dynamic modeling. Computer 32(8):68–75

    Article  Google Scholar 

  12. Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies 1. Hierarchical systems. Comput J 9(4):373–380

    Article  Google Scholar 

  13. Meilă M (2003) Comparing clusterings by the variation of information. In: Meilă M (ed) Learning theory and kernel machines. Springer, Berlin/Heidelberg, pp 173–187

  14. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856

    Google Scholar 

  15. Nguyen N, Caruana R (2007) Consensus clusterings. In: ICDM’07: proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, 607–612

  16. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  17. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MATH  MathSciNet  Google Scholar 

  18. Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Proceedings of IEEE international conference on data mining, pp 331–338

  19. Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: Proceedings of SIAM conference on data mining, pp 379–390

  20. Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Article  Google Scholar 

  21. Verma D, Meila M (2005) A comparison of spectral clustering algorithms. Technical report, Department of CSE University of Washington Seattle, WA 98195–2350

  22. Wu O, Zhu M, Hu W (2009) Fragment-based clustering ensembles. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1795–1798

  23. Wu O, Hu W, Maybank SJ, Zhu M, Li B (2012) Efficient clustering aggregation based on data fragments. IEEE Trans Syst Man Cybern Part B Cybern 42(3):913–926

    Article  Google Scholar 

  24. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bi-Ru Dai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, CH., Dai, BR. A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowl Inf Syst 41, 591–609 (2014). https://doi.org/10.1007/s10115-013-0667-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0667-1

Keywords

Navigation