Abstract
A similarity graph represents the local characteristics of a data set, and it is used as input to various clustering methods including spectral, graph-based, and hierarchical clustering. Several similarity graphs exist in the literature; however, there is not a single similarity graph that can handle all kinds of cluster shapes and structures. In this study, motivated by the successful applications of ensemble approaches to clustering, a generic method for consensus similarity graph construction is proposed. The proposed approach first constructs multiple similarity graphs using bootstrap aggregating (bagging). Then, these graphs are fused into a consensus similarity graph using the normalized co-association matrix. We use k-nearest neighbor, \(\varepsilon\)-neighborhood, fully connected graph, and proximity graphs as the base similarity graphs. Moreover, the proposed approach is coupled with various clustering algorithms including spectral, graph-based, and hierarchical clustering. The experimental results with various spatial and real data sets demonstrate the effectiveness of the consensus similarity graphs in clustering. The proposed approach is also robust to local noise.













Similar content being viewed by others
References
Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman &Hall/CRC, USA
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Tan P-N, Steinbach M, Kumar V (2013) Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining, 487–533
İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498
Nadler B, Galun M (2007) Fundamental limitations of spectral clustering. In: Advances in neural information processing systems, pp. 1017–1024
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems pp. 849–856
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608
Zhang X, Li J, Yu H (2011) Local density adaptive similarity measurement for spectral clustering. Pattern Recogn Lett 32(2):352–358
Correa CD, Lindstrom P (2012) Locally-scaled spectral clustering using empty region graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1330–1338
Mishra G, Mohanty SK (2020) Efficient construction of an approximate similarity graph for minimum spanning tree based clustering. Appl Soft Comput 97:106676
Chrysouli C, Tefas A (2015) Spectral clustering and semi-supervised learning using evolving similarity graphs. Appl Soft Comput 34:625–637
Zang W, Jiang Z, Ren L (2017) Improved spectral clustering based on density combining dna genetic algorithm. Int J Pattern Recognit Artif Intell 31(04):1750010
Tan M, Zhang S, Wu L (2020) Mutual knn based spectral clustering. Neural Comput Appl 32(11):6435–6442
Zhou Z-H (2019) Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, USA
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(03):337–372
Zhu X, Change Loy C, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1450–1457
Beauchemin M (2015) A density-based similarity matrix construction for spectral clustering. Neurocomputing 151:835–844
Carreira-Perpinán MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. Adv Neural Inf Process Syst 17:225–232
Premachandran V, Kakarala R (2013) Consensus of k-nns for robust neighborhood selection on graph-based manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1594–1601
Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput Stat Data Anal 53(12):4046–4072
Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25
Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N et al (2009) A survey: clustering ensembles techniques. World Acad Sci Eng Technol 50:636–645
Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Fred AL, Jain AK (2002) Data clustering using evidence accumulation. Object Recognit Support User Interact Service Robots 4:276–280
Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099
Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4):513–518
Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recogn 42(5):668–675
Lourenço A, Bulo SR, Rebagliati N, Fred AL, Figueiredo MA, Pelillo M (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1):331–357
Huang D, Wang C-D, Lai J-H (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Zhong C, Hu L, Yue X, Luo T, Fu Q, Xu H (2019) Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn 92:93–106
Zhong C, Luo T, Yue X (2018) Cluster ensemble based on iteratively refined co-association matrix. IEEE Access 6:69210–69223
Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48(8):2699–2709
Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning, p. 36
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the 2004 SIAM international conference on data mining, pp. 379–390
Mimaroglu S, Erdil E (2011) Combining multiple clusterings using similarity graph. Pattern Recogn 44(3):694–703
Hamidi SS, Akbari E, Motameni H (2019) Consensus clustering algorithm based on the automatic partitioning similarity graph. Data Knowl Eng 124:101754
Lu Y, Wan Y (2013) Pha: a fast potential-based hierarchical agglomerative clustering method. Pattern Recogn 46(5):1227–1239
Belhadj S, Attia A, Adnane BA, Ahmed-Foitih Z, Ahmed AT (2016) A novel epileptic seizure detection using fast potential-based hierarchical agglomerative clustering based on emd. Int J Comput Sci Netw Secur 16(5):7–12
Attia A, Frahta N, Moussaoui A, Belhadj S (2016) An efficient fmri data clustering method using pha algorithm and dynamic time warping. Int J Comput Sci Inf Secur 14(5):222–230
Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182
Lu Y, Hou X, Chen X (2016) A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing 173:3–8
Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42
Duda R, Hart P, Stork D (2012) Pattern classification. Wiley, New York
Yao AC-C (1982) On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J Comput 11(4):721–736
Toussaint GT (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12(4):261–268. https://doi.org/10.1016/0031-3203(80)90066-7
Gabriel KR, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–278. https://doi.org/10.2307/2412323
Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF (2004) Adaptive clustering ensembles. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, pp. 272–275
Casa A, Scrucca L, Menardi G (2021) Better than the best? answers via model ensemble in density-based clustering. Adv Data Anal Classif 15(3):599–623
Jain AK, Law MH (2005) Data clustering: A user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp. 1–10. Springer
Liu D, Nosovskiy GV, Sourina O (2008) Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recogn Lett 29(9):1261–1273
Ultsch A (2005) Clustering wih som: U*c. In: Proceedings of the workshop on self-organizing maps, pp. 75–82
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Fred A, Jain AK (2003) Robust data clustering. In: Proceedings of the 2003 IEEE Computer society conference on computer vision and pattern recognition, pp. II–II
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Johnson NL (ed) Breakthroughs in statistics. Springer, New York, pp 196–202
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This appendix includes the clustering results for the real data sets in Table 4.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
İnkaya, T. Consensus similarity graph construction for clustering. Pattern Anal Applic 26, 703–733 (2023). https://doi.org/10.1007/s10044-022-01116-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01116-w