A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Nazari, Ahmad; Dehghan, Ayob; Nejatian, Samad; Rezaie, Vahideh; Parvin, Hamid

doi:10.1007/s10044-017-0676-x

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Theoretical Advances
Published: 29 December 2017

Volume 22, pages 133–145, (2019)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ahmad Nazari¹,
Ayob Dehghan¹,
Samad Nejatian^2,3,
Vahideh Rezaie^3,4 &
…
Hamid Parvin^1,5

1702 Accesses
53 Citations
Explore all metrics

Abstract

Clustering as a major task in data mining is responsible for discovering hidden patterns in unlabeled datasets. Finding the best clustering is also considered as one of the most challenging problems in data mining. Due to the problem complexity and the weaknesses of primary clustering algorithm, a large part of research has been directed toward ensemble clustering methods. Ensemble clustering aggregates a pool of base clusterings and produces an output clustering that is also named consensus clustering. The consensus clustering is usually better clustering than the output clusterings of the basic clustering algorithms. However, lack of quality in base clusterings makes their consensus clustering weak. In spite of some researches in selection of a subset of high quality base clusterings based on a clustering assessment metric, cluster-level selection has been always ignored. In this paper, a new clustering ensemble framework has been proposed based on cluster-level weighting. The certainty amount that the given ensemble has about a cluster is considered as the reliability of that cluster. The certainty amount that the given ensemble has about a cluster is computed by the accretion amount of that cluster by the ensemble. Then by selecting the best clusters and assigning a weight to each selected cluster based on its reliability, the final ensemble is created. After that, the paper proposes cluster-level weighting co-association matrix instead of traditional co-association matrix. Then, two consensus functions have been introduced and used for production of the consensus partition. The proposed framework completely overshadows the state-of-the-art clustering ensemble methods experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diversity based cluster weighting in cluster ensemble: an information theory approach

Article 30 March 2019

Frouzan Rashidi, Samad Nejatian, … Vahideh Rezaie

Clustering ensemble method

Article Open access 16 January 2018

Tahani Alqurashi & Wenjia Wang

Clustering ensemble extraction: a knowledge reuse framework

Article 27 March 2024

Mohaddeseh Sedghi, Ebrahim Akbari, … Touraj Banirostam

References

Alizadeh H, Minaei-Bidgoli B, Amirgholipour SK (2009) A new method for improving the performance of K nearest neighbor using clustering technique. Int J Converg Inf Technol JCIT. ISSN: 1975-9320
Alizadeh H, Minaei-Bidgoli B, Parvin H (2013) Optimizing fuzzy cluster ensemble in string representation. IJPRAI 27(2). https://doi.org/10.1142/S0218001413500055
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408
Article Google Scholar
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150
Article Google Scholar
Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503
Article Google Scholar
Aminsharifi A, Irani D, Pooyesh S, Parvin H, Dehghani S, Yousofi K, Fazel E, Zibaie F (2017) Artificial neural network system to predict the postoperative outcome of percutaneous nephrolithotomy. J Endourol 31(5):461–467
Article Google Scholar
Ana LNF, Jain AK (2003) Robust data clustering. In: Proceedings. 2003 IEEE computer society conference on computer vision and pattern recognition, 2003, vol 2. IEEE, pp II-128
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Article Google Scholar
Charon I, Denoeud L, Guénoche A, Hudry O (2006) Maximum transfer distance between partitions. J Classif 23(1):103–121
Article MathSciNet MATH Google Scholar
Coretto P, Hennig C (2010) A simulation study to compare robust clustering methods based on mixtures. Adv Data Anal Classif 4:111–135
Article MathSciNet MATH Google Scholar
Cristofor D, Simovici D (2002) Finding median partitions using information-theoretical-based genetic algorithms. J Univ Comput Sci 8(2):153–172
MathSciNet MATH Google Scholar
Denoeud L (2008) Transfer distance between partitions. Adv Data Anal Classif 2:279–294
Article MathSciNet MATH Google Scholar
Di Gesù V (1994) Integrated fuzzy clustering. Fuzzy Sets Syst 68(3):293–308
Article Google Scholar
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):17
Google Scholar
Dueck D (2009) Affinity propagation: clustering data by passing messages. Ph.D. dissertation, University of Toronto
Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems (HIS’06)
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: ICML, vol 3, pp 186–193
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of international conference on machine learning (ICML)
Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining
Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognit 47(2):833–842
Article MATH Google Scholar
Fred A (2001) Finding consistent clusters in data partitions. In: Multiple classifier systems. Springer, Berlin, Heidelberg, pp 309–318
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the 16th international conference on pattern recognition, ICPR02, Quebec City, pp 276–280
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition
Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30
Google Scholar
Friedman JH, Meulman JJ (2002) Clustering objects on subsets of attributes. Technical report, Stanford University
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109
Article MathSciNet MATH Google Scholar
Guénoche A (2011) Consensus of partitions: a constructive approach. Adv Data Anal Classif 5:215–229
Article MathSciNet MATH Google Scholar
Gullo F, Tagarelli A, Greco S (2009) Diversity-based weighting schemes for clustering ensembles. SIAM, pp 437–448
Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99:1154–1176
Article MathSciNet MATH Google Scholar
Hu X, Yoo I (2004) Cluster ensemble and its applications in gene expression analysis. In: Proceedings of the second conference on Asia-Pacific bioinformatics-Volume 29. Australian Computer Society, Inc, pp 297–302
Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250
Article Google Scholar
Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233
Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
Jamalinia H, Khalouei S, Rezaie V, Nejatian S, Bagheri-Fard K, Parvin H (2017) Diverse classifier ensemble creation based on heuristic dataset modification. J Appl Stat. https://doi.org/10.1080/02664763.2017.1363163 (in press)
Article Google Scholar
Kleinberg J (2002) An impossibility theorem for clustering. In: Proceedings of Neural Information Processing Systems'02 (NIPS 2002). pp 446–453
Kuncheva LI (2004) Combining pattern classifiers, methods and algorithms. Wiley, New York
Book MATH Google Scholar
Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2. IEEE, pp 1214–1219
Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of SIAM international conference on data mining (SDM)
Li Z, Wu XM, Chang SF (2012) Segmentation using superpixels: a bipartite graph partitioning approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Lior R, Maimon O (2005) Data mining and knowledge discovery handbook. Springer, Berlin
MATH Google Scholar
Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2014) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev 41(1):27–48
Article Google Scholar
Mohammadi Jenghara M, Ebrahimpour-Komleh H, Parvin H (2017) Dynamic protein–protein interaction networks construction using firefly algorithm. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0626-7 (in press)
Article Google Scholar
Mohammadi Jenghara M, Ebrahimpour-komleh H, Rezaie V, Nejatian S, Parvin H, Syed-Yusof SK (2017) Imputing missing value through ensemble concept based on statistical measures. Knowl Inf Syst. https://doi.org/10.1007/s10115-017-1118-1 (in press)
Article Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118
Article MATH Google Scholar
Nejatian S, Omidvar R, Mohamadi H, Eskandar-Baghbani A, Rezaie V, Parvin H (2017) An optimization algorithm based on behavior of see-see partridge chicks. J Intell Fuzzy Syst. https://doi.org/10.3233/JIFS-161718 (in press)
Article Google Scholar
Nejatian S, Parvin H, Faraji E (2017) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.06.082 (in press)
Article Google Scholar
Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html
Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208
Article MathSciNet MATH Google Scholar
Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112
Article MathSciNet MATH Google Scholar
Parvin H, Alizadeh H, Minaei-Bidgoli B (2009a) A new method for constructing classifier ensembles. Int J Digit Content Technol Appl JDCTA. ISSN: 1975-9339 (in press)
Parvin H, Alizadeh H, Minaei-Bidgoli B (2009b) Using clustering for generating diversity in classifier ensemble. Int J Digit Content Technol Appl JDCTA 3(1):51–57. ISSN: 1975-9339
Parvin H, Beigi A, Mozayani N (2012) A clustering ensemble learning method based on the ant colony clustering algorithm. Int J Appl Comput Math 11(2):286–302
MathSciNet Google Scholar
Parvin H, Alinejad-Rokny H, Minaei-Bidgoli B, Parvin S (2013) A new classifier ensemble methodology based on subspace learning. J Exp Theor Artif Intell 25(2):227–250
Article MATH Google Scholar
Parvin H, Minaei-Bidgoli B, Alinejad-Rokny H, Punch WF (2013) Data weighing mechanisms for clustering ensembles. Comput Electr Eng 39(5):1433–1450
Article Google Scholar
Parvin H, Mirnabibaboli M, Alinejad-Rokny H (2015) Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng Appl AI 37:34–42
Article Google Scholar
Schynsa M, Haesbroeck G, Critchley F (2010) RelaxMCD: smooth optimisation for the minimum covariance determinant estimator. Comput Stat Data Anal 54:843–857
Article MathSciNet MATH Google Scholar
Sevillano X, Cobo G, Alías F, Socoró JC (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 697–698
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining partitionings. In: AAAI/IAAI, pp 93–99
Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Tavana M, Parvin H, Rezazadeh F (2017) Parkinson detection: an image processing approach. J Med Imaging Health Inf 7:464–472
Article Google Scholar
Topchy A, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: Proceedings of the 3rd IEEE international conference on data mining, pp 331–338
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Article Google Scholar
Wang T (2011) CA-Tree: a hierarchical structure for efficient and scalable co-association-based cluster ensembles. IEEE Trans Syst Man Cybern Part B Cybern 41(3):686–698
Article Google Scholar
Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recognit 42(5):668–675
Article MATH Google Scholar
Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41
Article MathSciNet MATH Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846
Article Google Scholar
Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recognit 47(10):3362–3375
Article Google Scholar
Zare N, Shameli H, Parvin H (2017) An innovative natural-derived meta-heuristic optimization method. Appl Intell. https://doi.org/10.1007/s10489-016-0805-z (in press)
Article Google Scholar
Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recognit 48(8):2699–2709
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Ahmad Nazari, Ayob Dehghan & Hamid Parvin
Department of Electrical Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian
Young Researchers and Elite Club, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian & Vahideh Rezaie
Department of Mathematics, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Vahideh Rezaie
Young Researchers and Elite Club, Nourabad Mamasani Branch, Islamic Azad University, Nourabad, Mamasani, Iran
Hamid Parvin

Authors

Ahmad Nazari
View author publications
You can also search for this author in PubMed Google Scholar
Ayob Dehghan
View author publications
You can also search for this author in PubMed Google Scholar
Samad Nejatian
View author publications
You can also search for this author in PubMed Google Scholar
Vahideh Rezaie
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayob Dehghan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nazari, A., Dehghan, A., Nejatian, S. et al. A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Applic 22, 133–145 (2019). https://doi.org/10.1007/s10044-017-0676-x

Download citation

Received: 04 April 2017
Accepted: 11 December 2017
Published: 29 December 2017
Issue Date: 05 February 2019
DOI: https://doi.org/10.1007/s10044-017-0676-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Abstract

Access this article

Similar content being viewed by others

Diversity based cluster weighting in cluster ensemble: an information theory approach

Clustering ensemble method

Clustering ensemble extraction: a knowledge reuse framework

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Abstract

Access this article

Similar content being viewed by others

Diversity based cluster weighting in cluster ensemble: an information theory approach

Clustering ensemble method

Clustering ensemble extraction: a knowledge reuse framework

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation