Clustering ensemble selection considering quality and diversity

Abbasi, Sadr-olah; Nejatian, Samad; Parvin, Hamid; Rezaie, Vahideh; Bagherifard, Karamolah

doi:10.1007/s10462-018-9642-2

Clustering ensemble selection considering quality and diversity

Published: 21 June 2018

Volume 52, pages 1311–1340, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Sadr-olah Abbasi¹,
Samad Nejatian^2,3,
Hamid Parvin^4,5,
Vahideh Rezaie^3,6 &
…
Karamolah Bagherifard^1,3

1354 Accesses
61 Citations
Explore all metrics

Abstract

It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can’t derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akbari E, Dahlan HM, Ibrahim R, Alizadeh H (2015) Hierarchical cluster ensemble selection. Eng Appl Artif Intell 39:146–156
Article Google Scholar
Alizadeh H (2008) Clustering ensemble based on a subset of primary clusters. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)
Alizadeh H, Minaei-Bidgoli B, Parvin H, Moshki M (2011a) An asymmetric criterion for cluster validation, developing concepts in applied intelligence. Stud Comput Intell 363:1–14
Google Scholar
Alizadeh H, Minaei-Bidgoli B, Parvin H (2011b) A new criterion for clusters validation. In: Artificial intelligence applications and innovations (AIAI 2011), IFIP, Part I. Springer, Heidelberg, pp 240–246
Alizadeh H, Minaeibidgoli B, Parvin H (2014a) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408
Article Google Scholar
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014b) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150
Article Google Scholar
Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503
Article Google Scholar
Ayad H, Kamel MS (2003) Finding natural clusters using multiclusterer combiner based on shared nearest neighbors. In: Proceedings of the fourth international workshop on multiple classifier systems, pp 166–175
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Article Google Scholar
Ayad H, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953
Article MATH Google Scholar
Azimi J (2008) An informed clustering ensemble. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)
Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI 2009, pp 992–997
Baumgartner R, Somorjai R, Summers R, Richter W, Ryner L, Jarmasz M (2000) Resampling as a cluster validation technique in fMRI. J Magn Reson Imaging 11:228–231
Article Google Scholar
Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. Pac Symp Biocomput 7:6–17
Google Scholar
Brandsma T, Buishand TA (1998) Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol Earth Syst Sci 2:195–209
Article Google Scholar
Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24(2):147–161. https://doi.org/10.1207/s15327906mbr2402_1
Article Google Scholar
Christou IT (2011) Coordination of cluster ensembles via exact methods. IEEE Trans Pattern Anal Mach Intell 33(2):279–293
Article Google Scholar
Das AK, Sil J (2007) Cluster validation using splitting and merging technique. In: International conference on computational intelligence and multimedia applications, ICCIMA
Davison AC, Hinkley DV, Young GA (2003) Recent developments in bootstrap methodology. Stat Sci 18:141–157
Article MathSciNet MATH Google Scholar
Derakhshani RR (2011) An ensemble method for classifying startle eyeblink modulation from high-speed video records. IEEE Trans Affect Comput 2(1):50–63
Article Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 7(10):1895–1924
Article Google Scholar
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Estivill-Castro V, Yang J (2003) Cluster validity using support vector machines. In: DaWaK 2003, LNCS, vol 2737, pp 244–256
Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems
Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining
Fischer B, Buhmann J (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415
Article Google Scholar
Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842
Article MATH Google Scholar
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: International conference on pattern recognition, ICPR02, Quebec City, pp 276–280
Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition
Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30
Google Scholar
Fridlyand J, Dudoit S (2001) Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Technical Report, no. 600
Ghaemi R, ben Sulaiman N, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Artif Intell Rev 35(4):287–318
Article Google Scholar
Ghosh J, Acharya A (2011) Cluster ensembles. Data Min Knowl Discov 1(4):305–315
Article Google Scholar
Gullo F, Domeniconi C, Tagarelli A (2010) Enhancing single-objective projective clustering ensembles. In: IEEE international conference on data mining (ICDM), pp 833–838
Gullo F, Domeniconi C, Tagarelli A (2012) Projective clustering ensembles. Data Min Knowl Discov (online)
Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250
Article Google Scholar
Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2017.2702343
Article Google Scholar
Iam-On N (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Article Google Scholar
Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233
Inokuchi R, Nakamura T, Miyamoto S (2006) Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: IEEE International conference on fuzzy systems, Canada, July 16–21
Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701
Article Google Scholar
Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16(6):1299–1323
Article MATH Google Scholar
Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430
Liu H, Liu T, Wu J, Tao D, Fu Y (2015) Spectral ensemble clustering, KDD’15 Sydney, Australia, pp 715–724
Liu H, Wu J, Liu T, Tao D, Fu Y (2017) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143
Article Google Scholar
Lu X, Yang Y, Wang H (2013) Selective clustering ensemble based on covariance. In: Zhou ZH, Roli F, Kittler J (eds) Multiple classifier systems. Springer, Berlin, pp 179–189
Chapter Google Scholar
Marxer R, Holonowicz P, Purwins H, Hazan A (2007) Dynamical hierarchical self-organization of harmonic motivic, and pitch categories. In: Music, brain and cognition, part 2: models of sound and cognition, held at NIPS
Minaei-Bidgoli B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: International conference on information technology, ITCC 04, Las Vegas, pp 188–192
Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2011) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev (online)
Möller U, Radke D (2006) Performance of data resampling methods for robust class discovery based on clustering. Intell Data Anal 10(2):139–162
Article Google Scholar
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Article MathSciNet MATH Google Scholar
Naldi MC, De Carvalho ACM, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289
Article MathSciNet MATH Google Scholar
Nazari A, Dehghan A, Nejatian S, Rezaie V, Parvin H (2017) A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0676-x
Article Google Scholar
Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html
Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208
Article MathSciNet MATH Google Scholar
Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112
Article MathSciNet MATH Google Scholar
Pattanasri N (2012) Learning to estimate slide comprehension in classrooms with support vector machines. IEEE Trans Learn Technol 5(1):52–61
Article Google Scholar
Rafiee G, Dlay SS, Woo WL (2013) Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recogn 46(10):2685–2699
Article Google Scholar
Rakhlin A, Caponnetto A (2007) Stability of k-means clustering. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT Press, Cambridge
Roth V, Lange T (2004) Feature selection in clustering problems. Advances in neural information processing systems, pp 473–480
Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: International conference on computational statistics, COMPSTAT
Soto V, Garcia-Moratilla S, Martinez-Munoz G, Hernandez- Lobato D, Suarez A (2014) A double pruning scheme for boosting ensembles. IEEE Trans Cybern 44(12):2682–2695
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
MathSciNet MATH Google Scholar
Topchy AP, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: IEEE international conference on data mining, pp 331–338
Wagner J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 2(4):206–218
Article Google Scholar
Wang X, Han D, Han C (2013) Rough set based cluster ensemble selection. In: Proceedings of the 16th international conference on information fusion, pp 438–444
Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic–prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21
Article Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846
Article Google Scholar
Yousefnezhad M, Zhang D (2015) Weighted spectral cluster ensemble. In: ICDM 2015, pp 549–558
Yousefnezhad M, Huang SJ, Zhang D (2017) WoCE: a framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Trans Cybern 99:1–14
Google Scholar
Yu Z, Chen H, You J, Han G, Li L (2013) Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data. IEEE/ACM Trans Comput Biol Bioinf 10(3):657–670
Article Google Scholar
Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recogn 47(10):3362–3375
Article Google Scholar
Yu Z, Li L, Liu J, Han G (2015) Hybrid adaptive classifier ensemble. IEEE Trans Cybern 45(2):177–190
Article Google Scholar
Yu Z, Chen H, Liu J, You J, Leung H, Han G (2016a) Hybrid k-nearest neighbor classifier. IEEE Trans Cybern 46(6):1263–1275
Article Google Scholar
Yu Z, Zhu X, Wong HS, You J, Zhang J, Han G (2016b) Distribution-based cluster structure selection. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2016.2569529
Article Google Scholar
Yu Z, Lu Y, Zhang J, You J, Wong HS, Wang Y, Han G (2017) Progressive semisupervised learning of multiple classifiers. IEEE Trans Cybern 99:1–14
Google Scholar
Zhong C et al (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48:2699–2709
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Sadr-olah Abbasi & Karamolah Bagherifard
Department of Electrical Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian
Young Researchers and Elite Club, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian, Vahideh Rezaie & Karamolah Bagherifard
Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin
Young Researchers and Elite Club, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin
Department of Mathematics, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Vahideh Rezaie

Authors

Sadr-olah Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Samad Nejatian
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Vahideh Rezaie
View author publications
You can also search for this author in PubMed Google Scholar
Karamolah Bagherifard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samad Nejatian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasi, So., Nejatian, S., Parvin, H. et al. Clustering ensemble selection considering quality and diversity. Artif Intell Rev 52, 1311–1340 (2019). https://doi.org/10.1007/s10462-018-9642-2

Download citation

Published: 21 June 2018
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10462-018-9642-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering ensemble selection considering quality and diversity

Abstract

Access this article

Similar content being viewed by others

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Diversity based cluster weighting in cluster ensemble: an information theory approach

Clustering ensemble method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering ensemble selection considering quality and diversity

Abstract

Access this article

Similar content being viewed by others

A comprehensive study of clustering ensemble weighting based on cluster quality and diversity

Diversity based cluster weighting in cluster ensemble: an information theory approach

Clustering ensemble method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation