Consensus similarity graph construction for clustering

İnkaya, Tülin

doi:10.1007/s10044-022-01116-w

Consensus similarity graph construction for clustering

Short Paper
Published: 27 November 2022

Volume 26, pages 703–733, (2023)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Tülin İnkaya ORCID: orcid.org/0000-0002-6260-0162¹

412 Accesses
1 Altmetric
Explore all metrics

Abstract

A similarity graph represents the local characteristics of a data set, and it is used as input to various clustering methods including spectral, graph-based, and hierarchical clustering. Several similarity graphs exist in the literature; however, there is not a single similarity graph that can handle all kinds of cluster shapes and structures. In this study, motivated by the successful applications of ensemble approaches to clustering, a generic method for consensus similarity graph construction is proposed. The proposed approach first constructs multiple similarity graphs using bootstrap aggregating (bagging). Then, these graphs are fused into a consensus similarity graph using the normalized co-association matrix. We use k-nearest neighbor, $\varepsilon$-neighborhood, fully connected graph, and proximity graphs as the base similarity graphs. Moreover, the proposed approach is coupled with various clustering algorithms including spectral, graph-based, and hierarchical clustering. The experimental results with various spatial and real data sets demonstrate the effectiveness of the consensus similarity graphs in clustering. The proposed approach is also robust to local noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering the Graph Structure in Clustering Results

A novel spectral clustering algorithm based on neighbor relation and Gaussian kernel function with only one parameter

Article 31 October 2023

A neighborhood-based three-stage hierarchical clustering algorithm

Article 29 July 2021

References

Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman &Hall/CRC, USA
MATH Google Scholar
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
MathSciNet Google Scholar
Tan P-N, Steinbach M, Kumar V (2013) Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining, 487–533
İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498
Google Scholar
Nadler B, Galun M (2007) Fundamental limitations of spectral clustering. In: Advances in neural information processing systems, pp. 1017–1024
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems pp. 849–856
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608
Google Scholar
Zhang X, Li J, Yu H (2011) Local density adaptive similarity measurement for spectral clustering. Pattern Recogn Lett 32(2):352–358
Google Scholar
Correa CD, Lindstrom P (2012) Locally-scaled spectral clustering using empty region graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1330–1338
Mishra G, Mohanty SK (2020) Efficient construction of an approximate similarity graph for minimum spanning tree based clustering. Appl Soft Comput 97:106676
Google Scholar
Chrysouli C, Tefas A (2015) Spectral clustering and semi-supervised learning using evolving similarity graphs. Appl Soft Comput 34:625–637
Google Scholar
Zang W, Jiang Z, Ren L (2017) Improved spectral clustering based on density combining dna genetic algorithm. Int J Pattern Recognit Artif Intell 31(04):1750010
Google Scholar
Tan M, Zhang S, Wu L (2020) Mutual knn based spectral clustering. Neural Comput Appl 32(11):6435–6442
Google Scholar
Zhou Z-H (2019) Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, USA
Google Scholar
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(03):337–372
MathSciNet Google Scholar
Zhu X, Change Loy C, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1450–1457
Beauchemin M (2015) A density-based similarity matrix construction for spectral clustering. Neurocomputing 151:835–844
Google Scholar
Carreira-Perpinán MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. Adv Neural Inf Process Syst 17:225–232
Google Scholar
Premachandran V, Kakarala R (2013) Consensus of k-nns for robust neighborhood selection on graph-based manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1594–1601
Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput Stat Data Anal 53(12):4046–4072
MathSciNet MATH Google Scholar
Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Google Scholar
Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25
MathSciNet MATH Google Scholar
Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N et al (2009) A survey: clustering ensembles techniques. World Acad Sci Eng Technol 50:636–645
Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Fred AL, Jain AK (2002) Data clustering using evidence accumulation. Object Recognit Support User Interact Service Robots 4:276–280
Google Scholar
Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Google Scholar
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099
Google Scholar
Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4):513–518
Google Scholar
Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recogn 42(5):668–675
MATH Google Scholar
Lourenço A, Bulo SR, Rebagliati N, Fred AL, Figueiredo MA, Pelillo M (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1):331–357
MathSciNet MATH Google Scholar
Huang D, Wang C-D, Lai J-H (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Google Scholar
Zhong C, Hu L, Yue X, Luo T, Fu Q, Xu H (2019) Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn 92:93–106
Google Scholar
Zhong C, Luo T, Yue X (2018) Cluster ensemble based on iteratively refined co-association matrix. IEEE Access 6:69210–69223
Google Scholar
Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48(8):2699–2709
MATH Google Scholar
Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953
MATH Google Scholar
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning, p. 36
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
MathSciNet MATH Google Scholar
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Google Scholar
Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the 2004 SIAM international conference on data mining, pp. 379–390
Mimaroglu S, Erdil E (2011) Combining multiple clusterings using similarity graph. Pattern Recogn 44(3):694–703
MATH Google Scholar
Hamidi SS, Akbari E, Motameni H (2019) Consensus clustering algorithm based on the automatic partitioning similarity graph. Data Knowl Eng 124:101754
Google Scholar
Lu Y, Wan Y (2013) Pha: a fast potential-based hierarchical agglomerative clustering method. Pattern Recogn 46(5):1227–1239
Google Scholar
Belhadj S, Attia A, Adnane BA, Ahmed-Foitih Z, Ahmed AT (2016) A novel epileptic seizure detection using fast potential-based hierarchical agglomerative clustering based on emd. Int J Comput Sci Netw Secur 16(5):7–12
Google Scholar
Attia A, Frahta N, Moussaoui A, Belhadj S (2016) An efficient fmri data clustering method using pha algorithm and dynamic time warping. Int J Comput Sci Inf Secur 14(5):222–230
Google Scholar
Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182
MathSciNet Google Scholar
Lu Y, Hou X, Chen X (2016) A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing 173:3–8
Google Scholar
Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42
MathSciNet MATH Google Scholar
Duda R, Hart P, Stork D (2012) Pattern classification. Wiley, New York
MATH Google Scholar
Yao AC-C (1982) On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J Comput 11(4):721–736
MathSciNet MATH Google Scholar
Toussaint GT (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12(4):261–268. https://doi.org/10.1016/0031-3203(80)90066-7
Article MathSciNet MATH Google Scholar
Gabriel KR, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–278. https://doi.org/10.2307/2412323
Article Google Scholar
Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF (2004) Adaptive clustering ensembles. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, pp. 272–275
Casa A, Scrucca L, Menardi G (2021) Better than the best? answers via model ensemble in density-based clustering. Adv Data Anal Classif 15(3):599–623
MathSciNet MATH Google Scholar
Jain AK, Law MH (2005) Data clustering: A user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp. 1–10. Springer
Liu D, Nosovskiy GV, Sourina O (2008) Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recogn Lett 29(9):1261–1273
MATH Google Scholar
Ultsch A (2005) Clustering wih som: U*c. In: Proceedings of the workshop on self-organizing maps, pp. 75–82
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Google Scholar
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86
MATH Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington
MATH Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Google Scholar
Fred A, Jain AK (2003) Robust data clustering. In: Proceedings of the 2003 IEEE Computer society conference on computer vision and pattern recognition, pp. II–II
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Johnson NL (ed) Breakthroughs in statistics. Springer, New York, pp 196–202
Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139
Google Scholar

Download references

Author information

Authors and Affiliations

Industrial Engineering Department, Bursa Uludağ University, Görükle Campus, 16059, Bursa, Turkey
Tülin İnkaya

Authors

Tülin İnkaya
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Tülin İnkaya.

Ethics declarations

Conflict of interest

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

This appendix includes the clustering results for the real data sets in Table 4.

Table 4 Clustering results for the real data sets

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

İnkaya, T. Consensus similarity graph construction for clustering. Pattern Anal Applic 26, 703–733 (2023). https://doi.org/10.1007/s10044-022-01116-w

Download citation

Received: 06 December 2021
Accepted: 17 October 2022
Published: 27 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10044-022-01116-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consensus similarity graph construction for clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discovering the Graph Structure in Clustering Results

A novel spectral clustering algorithm based on neighbor relation and Gaussian kernel function with only one parameter

A neighborhood-based three-stage hierarchical clustering algorithm

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now