Skip to main content
Log in

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Clustering as a major task in data mining is responsible for discovering hidden patterns in unlabeled datasets. Finding the best clustering is also considered as one of the most challenging problems in data mining. Due to the problem complexity and the weaknesses of primary clustering algorithm, a large part of research has been directed toward ensemble clustering methods. Ensemble clustering aggregates a pool of base clusterings and produces an output clustering that is also named consensus clustering. The consensus clustering is usually better clustering than the output clusterings of the basic clustering algorithms. However, lack of quality in base clusterings makes their consensus clustering weak. In spite of some researches in selection of a subset of high quality base clusterings based on a clustering assessment metric, cluster-level selection has been always ignored. In this paper, a new clustering ensemble framework has been proposed based on cluster-level weighting. The certainty amount that the given ensemble has about a cluster is considered as the reliability of that cluster. The certainty amount that the given ensemble has about a cluster is computed by the accretion amount of that cluster by the ensemble. Then by selecting the best clusters and assigning a weight to each selected cluster based on its reliability, the final ensemble is created. After that, the paper proposes cluster-level weighting co-association matrix instead of traditional co-association matrix. Then, two consensus functions have been introduced and used for production of the consensus partition. The proposed framework completely overshadows the state-of-the-art clustering ensemble methods experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alizadeh H, Minaei-Bidgoli B, Amirgholipour SK (2009) A new method for improving the performance of K nearest neighbor using clustering technique. Int J Converg Inf Technol JCIT. ISSN: 1975-9320

  2. Alizadeh H, Minaei-Bidgoli B, Parvin H (2013) Optimizing fuzzy cluster ensemble in string representation. IJPRAI 27(2). https://doi.org/10.1142/S0218001413500055

  3. Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408

    Article  Google Scholar 

  4. Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150

    Article  Google Scholar 

  5. Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503

    Article  Google Scholar 

  6. Aminsharifi A, Irani D, Pooyesh S, Parvin H, Dehghani S, Yousofi K, Fazel E, Zibaie F (2017) Artificial neural network system to predict the postoperative outcome of percutaneous nephrolithotomy. J Endourol 31(5):461–467

    Article  Google Scholar 

  7. Ana LNF, Jain AK (2003) Robust data clustering. In: Proceedings. 2003 IEEE computer society conference on computer vision and pattern recognition, 2003, vol 2. IEEE, pp II-128

  8. Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173

    Article  Google Scholar 

  9. Charon I, Denoeud L, Guénoche A, Hudry O (2006) Maximum transfer distance between partitions. J Classif 23(1):103–121

    Article  MathSciNet  MATH  Google Scholar 

  10. Coretto P, Hennig C (2010) A simulation study to compare robust clustering methods based on mixtures. Adv Data Anal Classif 4:111–135

    Article  MathSciNet  MATH  Google Scholar 

  11. Cristofor D, Simovici D (2002) Finding median partitions using information-theoretical-based genetic algorithms. J Univ Comput Sci 8(2):153–172

    MathSciNet  MATH  Google Scholar 

  12. Denoeud L (2008) Transfer distance between partitions. Adv Data Anal Classif 2:279–294

    Article  MathSciNet  MATH  Google Scholar 

  13. Di Gesù V (1994) Integrated fuzzy clustering. Fuzzy Sets Syst 68(3):293–308

    Article  Google Scholar 

  14. Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):17

    Google Scholar 

  15. Dueck D (2009) Affinity propagation: clustering data by passing messages. Ph.D. dissertation, University of Toronto

  16. Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems (HIS’06)

  17. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: ICML, vol 3, pp 186–193

  18. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of international conference on machine learning (ICML)

  19. Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining

  20. Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognit 47(2):833–842

    Article  MATH  Google Scholar 

  21. Fred A (2001) Finding consistent clusters in data partitions. In: Multiple classifier systems. Springer, Berlin, Heidelberg, pp 309–318

  22. Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the 16th international conference on pattern recognition, ICPR02, Quebec City, pp 276–280

  23. Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  24. Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition

  25. Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30

    Google Scholar 

  26. Friedman JH, Meulman JJ (2002) Clustering objects on subsets of attributes. Technical report, Stanford University

  27. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109

    Article  MathSciNet  MATH  Google Scholar 

  28. Guénoche A (2011) Consensus of partitions: a constructive approach. Adv Data Anal Classif 5:215–229

    Article  MathSciNet  MATH  Google Scholar 

  29. Gullo F, Tagarelli A, Greco S (2009) Diversity-based weighting schemes for clustering ensembles. SIAM, pp 437–448

  30. Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99:1154–1176

    Article  MathSciNet  MATH  Google Scholar 

  31. Hu X, Yoo I (2004) Cluster ensemble and its applications in gene expression analysis. In: Proceedings of the second conference on Asia-Pacific bioinformatics-Volume 29. Australian Computer Society, Inc, pp 297–302

  32. Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250

    Article  Google Scholar 

  33. Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233

  34. Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409

    Article  Google Scholar 

  35. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  36. Jamalinia H, Khalouei S, Rezaie V, Nejatian S, Bagheri-Fard K, Parvin H (2017) Diverse classifier ensemble creation based on heuristic dataset modification. J Appl Stat. https://doi.org/10.1080/02664763.2017.1363163 (in press)

    Article  Google Scholar 

  37. Kleinberg J (2002) An impossibility theorem for clustering. In: Proceedings of Neural Information Processing Systems'02 (NIPS 2002). pp 446–453

  38. Kuncheva LI (2004) Combining pattern classifiers, methods and algorithms. Wiley, New York

    Book  MATH  Google Scholar 

  39. Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2. IEEE, pp 1214–1219

  40. Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430

  41. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  42. Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of SIAM international conference on data mining (SDM)

  43. Li Z, Wu XM, Chang SF (2012) Segmentation using superpixels: a bipartite graph partitioning approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)

  44. Lior R, Maimon O (2005) Data mining and knowledge discovery handbook. Springer, Berlin

    MATH  Google Scholar 

  45. Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2014) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev 41(1):27–48

    Article  Google Scholar 

  46. Mohammadi Jenghara M, Ebrahimpour-Komleh H, Parvin H (2017) Dynamic protein–protein interaction networks construction using firefly algorithm. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0626-7 (in press)

    Article  Google Scholar 

  47. Mohammadi Jenghara M, Ebrahimpour-komleh H, Rezaie V, Nejatian S, Parvin H, Syed-Yusof SK (2017) Imputing missing value through ensemble concept based on statistical measures. Knowl Inf Syst. https://doi.org/10.1007/s10115-017-1118-1 (in press)

    Article  Google Scholar 

  48. Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118

    Article  MATH  Google Scholar 

  49. Nejatian S, Omidvar R, Mohamadi H, Eskandar-Baghbani A, Rezaie V, Parvin H (2017) An optimization algorithm based on behavior of see-see partridge chicks. J Intell Fuzzy Syst. https://doi.org/10.3233/JIFS-161718 (in press)

    Article  Google Scholar 

  50. Nejatian S, Parvin H, Faraji E (2017) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.06.082 (in press)

    Article  Google Scholar 

  51. Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html

  52. Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208

    Article  MathSciNet  MATH  Google Scholar 

  53. Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112

    Article  MathSciNet  MATH  Google Scholar 

  54. Parvin H, Alizadeh H, Minaei-Bidgoli B (2009a) A new method for constructing classifier ensembles. Int J Digit Content Technol Appl JDCTA. ISSN: 1975-9339 (in press)

  55. Parvin H, Alizadeh H, Minaei-Bidgoli B (2009b) Using clustering for generating diversity in classifier ensemble. Int J Digit Content Technol Appl JDCTA 3(1):51–57. ISSN: 1975-9339

  56. Parvin H, Beigi A, Mozayani N (2012) A clustering ensemble learning method based on the ant colony clustering algorithm. Int J Appl Comput Math 11(2):286–302

    MathSciNet  Google Scholar 

  57. Parvin H, Alinejad-Rokny H, Minaei-Bidgoli B, Parvin S (2013) A new classifier ensemble methodology based on subspace learning. J Exp Theor Artif Intell 25(2):227–250

    Article  MATH  Google Scholar 

  58. Parvin H, Minaei-Bidgoli B, Alinejad-Rokny H, Punch WF (2013) Data weighing mechanisms for clustering ensembles. Comput Electr Eng 39(5):1433–1450

    Article  Google Scholar 

  59. Parvin H, Mirnabibaboli M, Alinejad-Rokny H (2015) Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng Appl AI 37:34–42

    Article  Google Scholar 

  60. Schynsa M, Haesbroeck G, Critchley F (2010) RelaxMCD: smooth optimisation for the minimum covariance determinant estimator. Comput Stat Data Anal 54:843–857

    Article  MathSciNet  MATH  Google Scholar 

  61. Sevillano X, Cobo G, Alías F, Socoró JC (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 697–698

  62. Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining partitionings. In: AAAI/IAAI, pp 93–99

  63. Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  64. Tavana M, Parvin H, Rezazadeh F (2017) Parkinson detection: an image processing approach. J Med Imaging Health Inf 7:464–472

    Article  Google Scholar 

  65. Topchy A, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: Proceedings of the 3rd IEEE international conference on data mining, pp 331–338

  66. Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Article  Google Scholar 

  67. Wang T (2011) CA-Tree: a hierarchical structure for efficient and scalable co-association-based cluster ensembles. IEEE Trans Syst Man Cybern Part B Cybern 41(3):686–698

    Article  Google Scholar 

  68. Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recognit 42(5):668–675

    Article  MATH  Google Scholar 

  69. Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41

    Article  MathSciNet  MATH  Google Scholar 

  70. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846

    Article  Google Scholar 

  71. Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recognit 47(10):3362–3375

    Article  Google Scholar 

  72. Zare N, Shameli H, Parvin H (2017) An innovative natural-derived meta-heuristic optimization method. Appl Intell. https://doi.org/10.1007/s10489-016-0805-z (in press)

    Article  Google Scholar 

  73. Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recognit 48(8):2699–2709

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayob Dehghan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nazari, A., Dehghan, A., Nejatian, S. et al. A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Applic 22, 133–145 (2019). https://doi.org/10.1007/s10044-017-0676-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0676-x

Keywords

Navigation