Skip to main content
Log in

Clustering ensemble selection considering quality and diversity

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can’t derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • Akbari E, Dahlan HM, Ibrahim R, Alizadeh H (2015) Hierarchical cluster ensemble selection. Eng Appl Artif Intell 39:146–156

    Article  Google Scholar 

  • Alizadeh H (2008) Clustering ensemble based on a subset of primary clusters. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)

  • Alizadeh H, Minaei-Bidgoli B, Parvin H, Moshki M (2011a) An asymmetric criterion for cluster validation, developing concepts in applied intelligence. Stud Comput Intell 363:1–14

    Google Scholar 

  • Alizadeh H, Minaei-Bidgoli B, Parvin H (2011b) A new criterion for clusters validation. In: Artificial intelligence applications and innovations (AIAI 2011), IFIP, Part I. Springer, Heidelberg, pp 240–246

  • Alizadeh H, Minaeibidgoli B, Parvin H (2014a) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408

    Article  Google Scholar 

  • Alizadeh H, Minaei-Bidgoli B, Parvin H (2014b) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150

    Article  Google Scholar 

  • Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503

    Article  Google Scholar 

  • Ayad H, Kamel MS (2003) Finding natural clusters using multiclusterer combiner based on shared nearest neighbors. In: Proceedings of the fourth international workshop on multiple classifier systems, pp 166–175

  • Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173

    Article  Google Scholar 

  • Ayad H, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953

    Article  MATH  Google Scholar 

  • Azimi J (2008) An informed clustering ensemble. M.Sc. Dissertation, Iran University of Science and Technology (in Persian)

  • Azimi J, Fern X (2009) Adaptive cluster ensemble selection. In: IJCAI 2009, pp 992–997

  • Baumgartner R, Somorjai R, Summers R, Richter W, Ryner L, Jarmasz M (2000) Resampling as a cluster validation technique in fMRI. J Magn Reson Imaging 11:228–231

    Article  Google Scholar 

  • Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. Pac Symp Biocomput 7:6–17

    Google Scholar 

  • Brandsma T, Buishand TA (1998) Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling. Hydrol Earth Syst Sci 2:195–209

    Article  Google Scholar 

  • Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24(2):147–161. https://doi.org/10.1207/s15327906mbr2402_1

    Article  Google Scholar 

  • Christou IT (2011) Coordination of cluster ensembles via exact methods. IEEE Trans Pattern Anal Mach Intell 33(2):279–293

    Article  Google Scholar 

  • Das AK, Sil J (2007) Cluster validation using splitting and merging technique. In: International conference on computational intelligence and multimedia applications, ICCIMA

  • Davison AC, Hinkley DV, Young GA (2003) Recent developments in bootstrap methodology. Stat Sci 18:141–157

    Article  MathSciNet  MATH  Google Scholar 

  • Derakhshani RR (2011) An ensemble method for classifying startle eyeblink modulation from high-speed video records. IEEE Trans Affect Comput 2(1):50–63

    Article  Google Scholar 

  • Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 7(10):1895–1924

    Article  Google Scholar 

  • Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Estivill-Castro V, Yang J (2003) Cluster validity using support vector machines. In: DaWaK 2003, LNCS, vol 2737, pp 244–256

  • Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems

  • Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining

  • Fischer B, Buhmann J (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415

    Article  Google Scholar 

  • Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842

    Article  MATH  Google Scholar 

  • Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: International conference on pattern recognition, ICPR02, Quebec City, pp 276–280

  • Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  • Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition

  • Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30

    Google Scholar 

  • Fridlyand J, Dudoit S (2001) Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Statistics Berkeley Technical Report, no. 600

  • Ghaemi R, ben Sulaiman N, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Artif Intell Rev 35(4):287–318

    Article  Google Scholar 

  • Ghosh J, Acharya A (2011) Cluster ensembles. Data Min Knowl Discov 1(4):305–315

    Article  Google Scholar 

  • Gullo F, Domeniconi C, Tagarelli A (2010) Enhancing single-objective projective clustering ensembles. In: IEEE international conference on data mining (ICDM), pp 833–838

  • Gullo F, Domeniconi C, Tagarelli A (2012) Projective clustering ensembles. Data Min Knowl Discov (online)

  • Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250

    Article  Google Scholar 

  • Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2017.2702343

    Article  Google Scholar 

  • Iam-On N (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409

    Article  Google Scholar 

  • Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233

  • Inokuchi R, Nakamura T, Miyamoto S (2006) Kernelized cluster validity measures and application to evaluation of different clustering algorithms. In: IEEE International conference on fuzzy systems, Canada, July 16–21

  • Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701

    Article  Google Scholar 

  • Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16(6):1299–1323

    Article  MATH  Google Scholar 

  • Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430

  • Liu H, Liu T, Wu J, Tao D, Fu Y (2015) Spectral ensemble clustering, KDD’15 Sydney, Australia, pp 715–724

  • Liu H, Wu J, Liu T, Tao D, Fu Y (2017) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143

    Article  Google Scholar 

  • Lu X, Yang Y, Wang H (2013) Selective clustering ensemble based on covariance. In: Zhou ZH, Roli F, Kittler J (eds) Multiple classifier systems. Springer, Berlin, pp 179–189

    Chapter  Google Scholar 

  • Marxer R, Holonowicz P, Purwins H, Hazan A (2007) Dynamical hierarchical self-organization of harmonic motivic, and pitch categories. In: Music, brain and cognition, part 2: models of sound and cognition, held at NIPS

  • Minaei-Bidgoli B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: International conference on information technology, ITCC 04, Las Vegas, pp 188–192

  • Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2011) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev (online)

  • Möller U, Radke D (2006) Performance of data resampling methods for robust class discovery based on clustering. Intell Data Anal 10(2):139–162

    Article  Google Scholar 

  • Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    Article  MathSciNet  MATH  Google Scholar 

  • Naldi MC, De Carvalho ACM, Campello RJ (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289

    Article  MathSciNet  MATH  Google Scholar 

  • Nazari A, Dehghan A, Nejatian S, Rezaie V, Parvin H (2017) A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0676-x

    Article  Google Scholar 

  • Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html

  • Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208

    Article  MathSciNet  MATH  Google Scholar 

  • Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112

    Article  MathSciNet  MATH  Google Scholar 

  • Pattanasri N (2012) Learning to estimate slide comprehension in classrooms with support vector machines. IEEE Trans Learn Technol 5(1):52–61

    Article  Google Scholar 

  • Rafiee G, Dlay SS, Woo WL (2013) Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recogn 46(10):2685–2699

    Article  Google Scholar 

  • Rakhlin A, Caponnetto A (2007) Stability of k-means clustering. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT Press, Cambridge

  • Roth V, Lange T (2004) Feature selection in clustering problems. Advances in neural information processing systems, pp 473–480

  • Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: International conference on computational statistics, COMPSTAT

  • Soto V, Garcia-Moratilla S, Martinez-Munoz G, Hernandez- Lobato D, Suarez A (2014) A double pruning scheme for boosting ensembles. IEEE Trans Cybern 44(12):2682–2695

    Article  Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617

    MathSciNet  MATH  Google Scholar 

  • Topchy AP, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: IEEE international conference on data mining, pp 331–338

  • Wagner J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 2(4):206–218

    Article  Google Scholar 

  • Wang X, Han D, Han C (2013) Rough set based cluster ensemble selection. In: Proceedings of the 16th international conference on information fusion, pp 438–444

  • Wu CH (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic–prosodic information and semantic labels. IEEE Trans Affect Comput 2(1):10–21

    Article  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846

    Article  Google Scholar 

  • Yousefnezhad M, Zhang D (2015) Weighted spectral cluster ensemble. In: ICDM 2015, pp 549–558

  • Yousefnezhad M, Huang SJ, Zhang D (2017) WoCE: a framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Trans Cybern 99:1–14

    Google Scholar 

  • Yu Z, Chen H, You J, Han G, Li L (2013) Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data. IEEE/ACM Trans Comput Biol Bioinf 10(3):657–670

    Article  Google Scholar 

  • Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recogn 47(10):3362–3375

    Article  Google Scholar 

  • Yu Z, Li L, Liu J, Han G (2015) Hybrid adaptive classifier ensemble. IEEE Trans Cybern 45(2):177–190

    Article  Google Scholar 

  • Yu Z, Chen H, Liu J, You J, Leung H, Han G (2016a) Hybrid k-nearest neighbor classifier. IEEE Trans Cybern 46(6):1263–1275

    Article  Google Scholar 

  • Yu Z, Zhu X, Wong HS, You J, Zhang J, Han G (2016b) Distribution-based cluster structure selection. IEEE Trans Cybern 99:1–14. https://doi.org/10.1109/TCYB.2016.2569529

    Article  Google Scholar 

  • Yu Z, Lu Y, Zhang J, You J, Wong HS, Wang Y, Han G (2017) Progressive semisupervised learning of multiple classifiers. IEEE Trans Cybern 99:1–14

    Google Scholar 

  • Zhong C et al (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48:2699–2709

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samad Nejatian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbasi, So., Nejatian, S., Parvin, H. et al. Clustering ensemble selection considering quality and diversity. Artif Intell Rev 52, 1311–1340 (2019). https://doi.org/10.1007/s10462-018-9642-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-018-9642-2

Keywords

Navigation