Abstract
Clustering as a major task in data mining is responsible for discovering hidden patterns in unlabeled datasets. Finding the best clustering is also considered as one of the most challenging problems in data mining. Due to the problem complexity and the weaknesses of primary clustering algorithm, a large part of research has been directed toward ensemble clustering methods. Ensemble clustering aggregates a pool of base clusterings and produces an output clustering that is also named consensus clustering. The consensus clustering is usually better clustering than the output clusterings of the basic clustering algorithms. However, lack of quality in base clusterings makes their consensus clustering weak. In spite of some researches in selection of a subset of high quality base clusterings based on a clustering assessment metric, cluster-level selection has been always ignored. In this paper, a new clustering ensemble framework has been proposed based on cluster-level weighting. The certainty amount that the given ensemble has about a cluster is considered as the reliability of that cluster. The certainty amount that the given ensemble has about a cluster is computed by the accretion amount of that cluster by the ensemble. Then by selecting the best clusters and assigning a weight to each selected cluster based on its reliability, the final ensemble is created. After that, the paper proposes cluster-level weighting co-association matrix instead of traditional co-association matrix. Then, two consensus functions have been introduced and used for production of the consensus partition. The proposed framework completely overshadows the state-of-the-art clustering ensemble methods experimentally.
Similar content being viewed by others
References
Alizadeh H, Minaei-Bidgoli B, Amirgholipour SK (2009) A new method for improving the performance of K nearest neighbor using clustering technique. Int J Converg Inf Technol JCIT. ISSN: 1975-9320
Alizadeh H, Minaei-Bidgoli B, Parvin H (2013) Optimizing fuzzy cluster ensemble in string representation. IJPRAI 27(2). https://doi.org/10.1142/S0218001413500055
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) Cluster ensemble selection based on a new cluster stability measure. Intell Data Anal 18(3):389–408
Alizadeh H, Minaei-Bidgoli B, Parvin H (2014) To improve the quality of cluster ensembles by selecting a subset of base clusters. J Exp Theor Artif Intell 26(1):127–150
Alizadeh H, Yousefnezhad M, Minaei-Bidgoli B (2015) Wisdom of crowds cluster ensemble. Intell Data Anal 19(3):485–503
Aminsharifi A, Irani D, Pooyesh S, Parvin H, Dehghani S, Yousofi K, Fazel E, Zibaie F (2017) Artificial neural network system to predict the postoperative outcome of percutaneous nephrolithotomy. J Endourol 31(5):461–467
Ana LNF, Jain AK (2003) Robust data clustering. In: Proceedings. 2003 IEEE computer society conference on computer vision and pattern recognition, 2003, vol 2. IEEE, pp II-128
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with a variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Charon I, Denoeud L, Guénoche A, Hudry O (2006) Maximum transfer distance between partitions. J Classif 23(1):103–121
Coretto P, Hennig C (2010) A simulation study to compare robust clustering methods based on mixtures. Adv Data Anal Classif 4:111–135
Cristofor D, Simovici D (2002) Finding median partitions using information-theoretical-based genetic algorithms. J Univ Comput Sci 8(2):153–172
Denoeud L (2008) Transfer distance between partitions. Adv Data Anal Classif 2:279–294
Di Gesù V (1994) Integrated fuzzy clustering. Fuzzy Sets Syst 68(3):293–308
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2(4):17
Dueck D (2009) Affinity propagation: clustering data by passing messages. Ph.D. dissertation, University of Toronto
Faceli K, Marcilio CP, Souto D (2006) Multi-objective clustering ensemble. In: Proceedings of the sixth international conference on hybrid intelligent systems (HIS’06)
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: ICML, vol 3, pp 186–193
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of international conference on machine learning (ICML)
Fern XZ, Lin W (2008) Cluster ensemble selection. In: SIAM international conference on data mining
Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognit 47(2):833–842
Fred A (2001) Finding consistent clusters in data partitions. In: Multiple classifier systems. Springer, Berlin, Heidelberg, pp 309–318
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the 16th international conference on pattern recognition, ICPR02, Quebec City, pp 276–280
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Fred A, Jain AK (2006) Learning pairwise similarity for data clustering. In: International conference on pattern recognition
Fred A, Lourenco A (2008) Cluster ensemble methods: from single clusterings to combined solutions. Stud Comput Intell (SCI) 126:3–30
Friedman JH, Meulman JJ (2002) Clustering objects on subsets of attributes. Technical report, Stanford University
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109
Guénoche A (2011) Consensus of partitions: a constructive approach. Adv Data Anal Classif 5:215–229
Gullo F, Tagarelli A, Greco S (2009) Diversity-based weighting schemes for clustering ensembles. SIAM, pp 437–448
Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99:1154–1176
Hu X, Yoo I (2004) Cluster ensemble and its applications in gene expression analysis. In: Proceedings of the second conference on Asia-Pacific bioinformatics-Volume 29. Australian Computer Society, Inc, pp 297–302
Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250
Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of international conference on discovery science (ICDS), pp 222–233
Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Jamalinia H, Khalouei S, Rezaie V, Nejatian S, Bagheri-Fard K, Parvin H (2017) Diverse classifier ensemble creation based on heuristic dataset modification. J Appl Stat. https://doi.org/10.1080/02664763.2017.1363163 (in press)
Kleinberg J (2002) An impossibility theorem for clustering. In: Proceedings of Neural Information Processing Systems'02 (NIPS 2002). pp 446–453
Kuncheva LI (2004) Combining pattern classifiers, methods and algorithms. Wiley, New York
Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2. IEEE, pp 1214–1219
Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 424–430
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of SIAM international conference on data mining (SDM)
Li Z, Wu XM, Chang SF (2012) Segmentation using superpixels: a bipartite graph partitioning approach. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR)
Lior R, Maimon O (2005) Data mining and knowledge discovery handbook. Springer, Berlin
Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2014) Effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev 41(1):27–48
Mohammadi Jenghara M, Ebrahimpour-Komleh H, Parvin H (2017) Dynamic protein–protein interaction networks construction using firefly algorithm. Pattern Anal Appl. https://doi.org/10.1007/s10044-017-0626-7 (in press)
Mohammadi Jenghara M, Ebrahimpour-komleh H, Rezaie V, Nejatian S, Parvin H, Syed-Yusof SK (2017) Imputing missing value through ensemble concept based on statistical measures. Knowl Inf Syst. https://doi.org/10.1007/s10115-017-1118-1 (in press)
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118
Nejatian S, Omidvar R, Mohamadi H, Eskandar-Baghbani A, Rezaie V, Parvin H (2017) An optimization algorithm based on behavior of see-see partridge chicks. J Intell Fuzzy Syst. https://doi.org/10.3233/JIFS-161718 (in press)
Nejatian S, Parvin H, Faraji E (2017) Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.06.082 (in press)
Newman CBDJ, Hettich S, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/˜mlearn/MLSummary.html
Parvin H, Minaei-Bidgoli B (2013) A clustering ensemble framework based on elite selection of weighted clusters. Adv Data Anal Classif 7(2):181–208
Parvin H, Minaei-Bidgoli B (2015) A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal Appl 18(1):87–112
Parvin H, Alizadeh H, Minaei-Bidgoli B (2009a) A new method for constructing classifier ensembles. Int J Digit Content Technol Appl JDCTA. ISSN: 1975-9339 (in press)
Parvin H, Alizadeh H, Minaei-Bidgoli B (2009b) Using clustering for generating diversity in classifier ensemble. Int J Digit Content Technol Appl JDCTA 3(1):51–57. ISSN: 1975-9339
Parvin H, Beigi A, Mozayani N (2012) A clustering ensemble learning method based on the ant colony clustering algorithm. Int J Appl Comput Math 11(2):286–302
Parvin H, Alinejad-Rokny H, Minaei-Bidgoli B, Parvin S (2013) A new classifier ensemble methodology based on subspace learning. J Exp Theor Artif Intell 25(2):227–250
Parvin H, Minaei-Bidgoli B, Alinejad-Rokny H, Punch WF (2013) Data weighing mechanisms for clustering ensembles. Comput Electr Eng 39(5):1433–1450
Parvin H, Mirnabibaboli M, Alinejad-Rokny H (2015) Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng Appl AI 37:34–42
Schynsa M, Haesbroeck G, Critchley F (2010) RelaxMCD: smooth optimisation for the minimum covariance determinant estimator. Comput Stat Data Anal 54:843–857
Sevillano X, Cobo G, Alías F, Socoró JC (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 697–698
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining partitionings. In: AAAI/IAAI, pp 93–99
Strehl A, Ghosh J (2003) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Tavana M, Parvin H, Rezazadeh F (2017) Parkinson detection: an image processing approach. J Med Imaging Health Inf 7:464–472
Topchy A, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: Proceedings of the 3rd IEEE international conference on data mining, pp 331–338
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Wang T (2011) CA-Tree: a hierarchical structure for efficient and scalable co-association-based cluster ensembles. IEEE Trans Syst Man Cybern Part B Cybern 41(3):686–698
Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recognit 42(5):668–675
Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–846
Yu Z, Li L, Gao Y, You J, Liu J, Wong HS, Han G (2014) Hybrid clustering solution selection strategy. Pattern Recognit 47(10):3362–3375
Zare N, Shameli H, Parvin H (2017) An innovative natural-derived meta-heuristic optimization method. Appl Intell. https://doi.org/10.1007/s10489-016-0805-z (in press)
Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: two-level-refined co-association matrix with path-based transformation. Pattern Recognit 48(8):2699–2709
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nazari, A., Dehghan, A., Nejatian, S. et al. A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Applic 22, 133–145 (2019). https://doi.org/10.1007/s10044-017-0676-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0676-x