Skip to main content
Log in

Clustering Ensemble Based on Sample’s Certainty

  • Published:
Cognitive Computation Aims and scope Submit manuscript

A Correction to this article was published on 23 November 2021

This article has been updated

Abstract

The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample’s certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample’s certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Change history

Notes

  1. https://archive.ics.uci.edu/ml/index.php

  2. http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

References

  1. Verma M, Srivastava M, Chack N, Diswar AK, Gupta N. A comparative study of various clustering algorithms in data mining. Int J Eng Res Appl (IJERA). 2012;2(3):1379–84.

    Google Scholar 

  2. Abualigah LM, Khader AT, Al-Betar MA. Unsupervised feature selection technique based on genetic algorithm for improving the text clustering; In: Proceedings of the 2016 7th international conference on computer science and information technology (CSIT), 2016. IEEE.

  3. Elankavi R, Kalaiprasath R, Udayakumar DR. A fast clustering algorithm for high-dimensional data. International Journal of Civil Engineering and Technology (IJCIET). 2017;8(5):1220–7.

    Google Scholar 

  4. Kang Z, Pan H, Hoi SC, Xu Z. Robust graph learning from noisy data. IEEE transactions on cybernetics. 2019;50(5):1833–43.

    Article  Google Scholar 

  5. Strehl A, Ghosh J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002, 3(Dec);583–617.

  6. Li F, Qian Y, Wang J, Liang J. Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method. Inf Sci. 2017;378:389–409.

    Article  Google Scholar 

  7. Abdala DD, Wattuya P, Jiang X. Ensemble clustering via random walker consensus strategy; In: Proceedings of the 2010 20th International Conference on Pattern Recognition. 2010. IEEE.

  8. Tumer K, Agogino AK. Ensemble clustering with voting active clusters. Pattern Recogn Lett. 2008;29(14):1947–53.

    Article  Google Scholar 

  9. Zhou P, Du L, Wang H, Shi L, Shen YD. Learning a robust consensus matrix for clustering ensemble via Kullback-Leibler divergence minimization; In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. Citeseer.

  10. Fern XZ, Lin W. Cluster ensemble selection. Statistical Analysis and Data Mining: The ASA Data Science Journal. 2008;1(3):128–41.

    Article  MathSciNet  Google Scholar 

  11. Kuncheva LI, Vetrov DP. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell. 2006;28(11):1798–808.

    Article  Google Scholar 

  12. Kuncheva LI, Hadjitodorov ST. Using diversity in cluster ensembles; In: Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat No 04CH37583), 2004. IEEE.

  13. Domeniconi C, Al-Razgan M. Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data (TKDD). 2009;2(4):1–40.

    Article  Google Scholar 

  14. Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.

    Article  Google Scholar 

  15. Li F, Qian Y, Wang J, Dang C, Jing L. Clustering ensemble based on sample’s stability. Artif Intell. 2019;273:37–55.

    Article  MathSciNet  Google Scholar 

  16. Zhou P, Du L, Liu X, et al. Self-Paced Clustering Ensemble. IEEE Transactions on Neural Networks and Learning Systems, 2020.

  17. Zhou P, Du L, Li X. Self-paced Consensus Clustering with Bipartite Graph; In: Proceedings of the Proceedings of International Joint Conference on Artificial Intelligence, 2020.

  18. Duarte FJ, Fred AL, Lourenço A, Rodrigues MF. Weighting cluster ensembles in evidence accumulation clustering; In: Proceedings of the 2005 portuguese conference on artificial intelligence, 2005. IEEE.

  19. Fred AL, Jain AK. Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell. 2005;27(6):835–50.

    Article  Google Scholar 

  20. Fern XZ, Brodley CE. Solving cluster ensemble problems by bipartite graph partitioning; In: Proceedings of the twenty-first international conference on Machine learning, 2004.

  21. Minaei-Bidgoli B, Topchy AP, Punch WF. A Comparison of Resampling Methods for Clustering Ensembles. In: Proceedings of the IC-AI, 2004.

  22. Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering; In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

  23. Iam-On N, Boongoen T, Garrett S, Price C. A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell. 2011;33(12):2396–409.

    Article  Google Scholar 

  24. Huang D, Wang CD, Peng H, Lai J, Kwoh CK. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018.

  25. Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering; In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015.

  26. Zhou J, Zheng H, Pan L. Ensemble clustering based on dense representation. Neurocomputing. 2019;357:66–76.

    Article  Google Scholar 

  27. Bagherinia A, Minaei-Bidgoli B, Hosseinzadeh M, Parvin H. Reliability-based fuzzy clustering ensemble. Fuzzy Sets Syst. 2020.

  28. Johnson SC. Hierarchical clustering schemes. Psychometrika. 1967;32(3):241–54.

    Article  Google Scholar 

  29. Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.

    Article  MathSciNet  Google Scholar 

  30. Huang D, Lai JH, Wang CD. Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng. 2015;28(5):1312–26.

    Article  Google Scholar 

  31. Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng. 2019;32(6):1212–26.

    Article  Google Scholar 

  32. Franek L, Jiang X. Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn. 2014;47(2):833–42.

    Article  Google Scholar 

  33. Bai L, Liang J, Cao F. A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Information Fusion. 2020;61:36–47.

    Article  Google Scholar 

  34. Tao Z, Liu H, Li S, Ding Z, Fu Y. Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data (TKDD). 2019;13(1):1–25.

    Article  Google Scholar 

  35. Kang Z, Zhao X, Peng C, et al. Partition level multiview subspace clustering. Neural Netw. 2020;122:279–88.

    Article  Google Scholar 

  36. Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9(1):62–6.

    Article  MathSciNet  Google Scholar 

  37. Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging. 2004;13(1):146–65.

    Article  Google Scholar 

  38. Fu L, Medico E. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics. 2007;8(1):3.

    Article  Google Scholar 

  39. Jain AK, Law MH. Data clustering: A user’s dilemma; In: Proceedings of the International conference on pattern recognition and machine intelligence, 2005. Springer.

  40. Ultsch A. Clustering with SOM: U*C; In: Proceedings of the Proc Workshop on Self-organizing Maps, 2005.

  41. Ayad HG, Kamel MS. On voting-based consensus of cluster ensembles. Pattern Recogn. 2010;43(5):1943–53.

    Article  Google Scholar 

  42. Zhou ZH, Tang W. Clusterer ensemble. Knowl-Based Syst. 2006;19(1):77–83.

    Article  Google Scholar 

  43. Iam-on N, Garrett S. Linkclue: A matlab package for link-based cluster ensembles. J Stat Softw. 2010;36(9):1–36.

    Article  Google Scholar 

  44. Yang Y. An evaluation of statistical approaches to text categorization. Inf Retrieval. 1999;1(1–2):69–90.

    Article  Google Scholar 

  45. Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193–218.

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the Natural Science Foundation of China under Grant 61402004 and 61672034, in part by the Key Research and Development Program of Anhui Province under Grant 1804d08020309, and in part by the Natural Science Foundation of Anhui Province under Grant 1908085MF188, and in part by the Key Project of Natural Science Foundation of Anhui Provincial Department of Education under Grant KJ2020A0041.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuejun Li.

Ethics declarations

Ethical Approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflict of Interest

The authors declare no competing interests.

Additional information

The original vision of this article has been revised. Funding information has been corrected.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, X., Liu, S., Zhao, P. et al. Clustering Ensemble Based on Sample’s Certainty. Cogn Comput 13, 1034–1046 (2021). https://doi.org/10.1007/s12559-021-09876-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09876-z

Keywords

Navigation