Abstract
It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can’t derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.
Similar content being viewed by others
References
Akbari E, Dahlan HM, Ibrahim R, Alizadeh H (2015) Hierarchical cluster ensemble selection. Eng Appl Artif Intell 39:146–156
Alizadeh H (2008) Clustering ensemble based on a subset of primary clusters. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)
Alizadeh H, Minaei-Bidgoli B, Parvin H, Moshki M (2011a) An asymmetric criterion for cluster validation, developing concepts in applied intelligence. Stud Comput Intell 363:1–14
Alizadeh H, Minaei-Bidgoli B, Parvin H (2011b) A new criterion for clusters validation. In: Artificial intelligence applications and innovations (AIAI 2011), IFIP, Part I. Springer, Heidelberg, pp 240–246
Alizadeh H, Minaeibidgoli B, Parvin H (2014a) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014b) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150
Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503
Ayad H, Kamel MS (2003) Finding natural clusters using multiclusterer combiner based on shared nearest neighbors. In: Proceedings of the fourth international workshop on multiple classifier systems, pp 166–175
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Ayad H, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953
Azimi J (2008) An informed clustering ensemble. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)
Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI 2009, pp 992–997
Baumgartner R, Somorjai R, Summers R, Richter W, Ryner L, Jarmasz M (2000) Resampling as a cluster validation technique in fMRI. J Magn Reson Imaging 11:228–231
Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. Pac Symp Biocomput 7:6–17
Brandsma T, Buishand TA (1998) Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol Earth Syst Sci 2:195–209
Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24(2):147–161. https://doi.org/10.1207/s15327906mbr2402_1
Christou IT (2011) Coordination of cluster ensembles via exact methods. IEEE Trans Pattern Anal Mach Intell 33(2):279–293
Das AK, Sil J (2007) Cluster validation using splitting and merging technique. In: International conference on computational intelligence and multimedia applications, ICCIMA
Davison AC, Hinkley DV, Young GA (2003) Recent developments in bootstrap methodology. Stat Sci 18:141–157
Derakhshani RR (2011) An ensemble method for classifying startle eyeblink modulation from high-speed video records. IEEE Trans Affect Comput 2(1):50–63
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 7(10):1895–1924
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Estivill-Castro V, Yang J (2003) Cluster validity using support vector machines. In: DaWaK 2003, LNCS, vol 2737, pp 244–256
Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems
Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining
Fischer B, Buhmann J (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415
Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: International conference on pattern recognition, ICPR02, Quebec City, pp 276–280
Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition
Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30
Fridlyand J, Dudoit S (2001) Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Technical Report, no. 600
Ghaemi R, ben Sulaiman N, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Artif Intell Rev 35(4):287–318
Ghosh J, Acharya A (2011) Cluster ensembles. Data Min Knowl Discov 1(4):305–315
Gullo F, Domeniconi C, Tagarelli A (2010) Enhancing single-objective projective clustering ensembles. In: IEEE international conference on data mining (ICDM), pp 833–838
Gullo F, Domeniconi C, Tagarelli A (2012) Projective clustering ensembles. Data Min Knowl Discov (online)
Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250
Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2017.2702343
Iam-On N (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233
Inokuchi R, Nakamura T, Miyamoto S (2006) Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: IEEE International conference on fuzzy systems, Canada, July 16–21
Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701
Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16(6):1299–1323
Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430
Liu H, Liu T, Wu J, Tao D, Fu Y (2015) Spectral ensemble clustering, KDD’15 Sydney, Australia, pp 715–724
Liu H, Wu J, Liu T, Tao D, Fu Y (2017) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143
Lu X, Yang Y, Wang H (2013) Selective clustering ensemble based on covariance. In: Zhou ZH, Roli F, Kittler J (eds) Multiple classifier systems. Springer, Berlin, pp 179–189
Marxer R, Holonowicz P, Purwins H, Hazan A (2007) Dynamical hierarchical self-organization of harmonic motivic, and pitch categories. In: Music, brain and cognition, part 2: models of sound and cognition, held at NIPS
Minaei-Bidgoli B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: International conference on information technology, ITCC 04, Las Vegas, pp 188–192
Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2011) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev (online)
Möller U, Radke D (2006) Performance of data resampling methods for robust class discovery based on clustering. Intell Data Anal 10(2):139–162
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Naldi MC, De Carvalho ACM, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289
Nazari A, Dehghan A, Nejatian S, Rezaie V, Parvin H (2017) A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0676-x
Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html
Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208
Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112
Pattanasri N (2012) Learning to estimate slide comprehension in classrooms with support vector machines. IEEE Trans Learn Technol 5(1):52–61
Rafiee G, Dlay SS, Woo WL (2013) Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recogn 46(10):2685–2699
Rakhlin A, Caponnetto A (2007) Stability of k-means clustering. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT Press, Cambridge
Roth V, Lange T (2004) Feature selection in clustering problems. Advances in neural information processing systems, pp 473–480
Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: International conference on computational statistics, COMPSTAT
Soto V, Garcia-Moratilla S, Martinez-Munoz G, Hernandez- Lobato D, Suarez A (2014) A double pruning scheme for boosting ensembles. IEEE Trans Cybern 44(12):2682–2695
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
Topchy AP, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: IEEE international conference on data mining, pp 331–338
Wagner J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 2(4):206–218
Wang X, Han D, Han C (2013) Rough set based cluster ensemble selection. In: Proceedings of the 16th international conference on information fusion, pp 438–444
Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic–prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846
Yousefnezhad M, Zhang D (2015) Weighted spectral cluster ensemble. In: ICDM 2015, pp 549–558
Yousefnezhad M, Huang SJ, Zhang D (2017) WoCE: a framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Trans Cybern 99:1–14
Yu Z, Chen H, You J, Han G, Li L (2013) Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data. IEEE/ACM Trans Comput Biol Bioinf 10(3):657–670
Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recogn 47(10):3362–3375
Yu Z, Li L, Liu J, Han G (2015) Hybrid adaptive classifier ensemble. IEEE Trans Cybern 45(2):177–190
Yu Z, Chen H, Liu J, You J, Leung H, Han G (2016a) Hybrid k-nearest neighbor classifier. IEEE Trans Cybern 46(6):1263–1275
Yu Z, Zhu X, Wong HS, You J, Zhang J, Han G (2016b) Distribution-based cluster structure selection. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2016.2569529
Yu Z, Lu Y, Zhang J, You J, Wong HS, Wang Y, Han G (2017) Progressive semisupervised learning of multiple classifiers. IEEE Trans Cybern 99:1–14
Zhong C et al (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48:2699–2709
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbasi, So., Nejatian, S., Parvin, H. et al. Clustering ensemble selection considering quality and diversity. Artif Intell Rev 52, 1311–1340 (2019). https://doi.org/10.1007/s10462-018-9642-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-018-9642-2