Skip to main content
Log in

Consensus similarity graph construction for clustering

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

A similarity graph represents the local characteristics of a data set, and it is used as input to various clustering methods including spectral, graph-based, and hierarchical clustering. Several similarity graphs exist in the literature; however, there is not a single similarity graph that can handle all kinds of cluster shapes and structures. In this study, motivated by the successful applications of ensemble approaches to clustering, a generic method for consensus similarity graph construction is proposed. The proposed approach first constructs multiple similarity graphs using bootstrap aggregating (bagging). Then, these graphs are fused into a consensus similarity graph using the normalized co-association matrix. We use k-nearest neighbor, \(\varepsilon\)-neighborhood, fully connected graph, and proximity graphs as the base similarity graphs. Moreover, the proposed approach is coupled with various clustering algorithms including spectral, graph-based, and hierarchical clustering. The experimental results with various spatial and real data sets demonstrate the effectiveness of the consensus similarity graphs in clustering. The proposed approach is also robust to local noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman &Hall/CRC, USA

    MATH  Google Scholar 

  2. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    MathSciNet  Google Scholar 

  3. Tan P-N, Steinbach M, Kumar V (2013) Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining, 487–533

  4. İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498

    Google Scholar 

  5. Nadler B, Galun M (2007) Fundamental limitations of spectral clustering. In: Advances in neural information processing systems, pp. 1017–1024

  6. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems pp. 849–856

  7. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608

    Google Scholar 

  8. Zhang X, Li J, Yu H (2011) Local density adaptive similarity measurement for spectral clustering. Pattern Recogn Lett 32(2):352–358

    Google Scholar 

  9. Correa CD, Lindstrom P (2012) Locally-scaled spectral clustering using empty region graphs. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1330–1338

  10. Mishra G, Mohanty SK (2020) Efficient construction of an approximate similarity graph for minimum spanning tree based clustering. Appl Soft Comput 97:106676

    Google Scholar 

  11. Chrysouli C, Tefas A (2015) Spectral clustering and semi-supervised learning using evolving similarity graphs. Appl Soft Comput 34:625–637

    Google Scholar 

  12. Zang W, Jiang Z, Ren L (2017) Improved spectral clustering based on density combining dna genetic algorithm. Int J Pattern Recognit Artif Intell 31(04):1750010

    Google Scholar 

  13. Tan M, Zhang S, Wu L (2020) Mutual knn based spectral clustering. Neural Comput Appl 32(11):6435–6442

    Google Scholar 

  14. Zhou Z-H (2019) Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, USA

    Google Scholar 

  15. Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(03):337–372

    MathSciNet  Google Scholar 

  16. Zhu X, Change Loy C, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1450–1457

  17. Beauchemin M (2015) A density-based similarity matrix construction for spectral clustering. Neurocomputing 151:835–844

    Google Scholar 

  18. Carreira-Perpinán MA, Zemel RS (2005) Proximity graphs for clustering and manifold learning. Adv Neural Inf Process Syst 17:225–232

    Google Scholar 

  19. Premachandran V, Kakarala R (2013) Consensus of k-nns for robust neighborhood selection on graph-based manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1594–1601

  20. Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput Stat Data Anal 53(12):4046–4072

    MathSciNet  MATH  Google Scholar 

  21. Woźniak M, Grana M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17

    Google Scholar 

  22. Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25

    MathSciNet  MATH  Google Scholar 

  23. Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N et al (2009) A survey: clustering ensembles techniques. World Acad Sci Eng Technol 50:636–645

    Google Scholar 

  24. Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  25. Fred AL, Jain AK (2002) Data clustering using evidence accumulation. Object Recognit Support User Interact Service Robots 4:276–280

    Google Scholar 

  26. Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Google Scholar 

  27. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099

    Google Scholar 

  28. Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4):513–518

    Google Scholar 

  29. Wang X, Yang C, Zhou J (2009) Clustering aggregation by probability accumulation. Pattern Recogn 42(5):668–675

    MATH  Google Scholar 

  30. Lourenço A, Bulo SR, Rebagliati N, Fred AL, Figueiredo MA, Pelillo M (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1):331–357

    MathSciNet  MATH  Google Scholar 

  31. Huang D, Wang C-D, Lai J-H (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473

    Google Scholar 

  32. Zhong C, Hu L, Yue X, Luo T, Fu Q, Xu H (2019) Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn 92:93–106

    Google Scholar 

  33. Zhong C, Luo T, Yue X (2018) Cluster ensemble based on iteratively refined co-association matrix. IEEE Access 6:69210–69223

    Google Scholar 

  34. Zhong C, Yue X, Zhang Z, Lei J (2015) A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. Pattern Recogn 48(8):2699–2709

    MATH  Google Scholar 

  35. Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953

    MATH  Google Scholar 

  36. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning, p. 36

  37. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    MathSciNet  MATH  Google Scholar 

  38. Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Google Scholar 

  39. Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the 2004 SIAM international conference on data mining, pp. 379–390

  40. Mimaroglu S, Erdil E (2011) Combining multiple clusterings using similarity graph. Pattern Recogn 44(3):694–703

    MATH  Google Scholar 

  41. Hamidi SS, Akbari E, Motameni H (2019) Consensus clustering algorithm based on the automatic partitioning similarity graph. Data Knowl Eng 124:101754

    Google Scholar 

  42. Lu Y, Wan Y (2013) Pha: a fast potential-based hierarchical agglomerative clustering method. Pattern Recogn 46(5):1227–1239

    Google Scholar 

  43. Belhadj S, Attia A, Adnane BA, Ahmed-Foitih Z, Ahmed AT (2016) A novel epileptic seizure detection using fast potential-based hierarchical agglomerative clustering based on emd. Int J Comput Sci Netw Secur 16(5):7–12

    Google Scholar 

  44. Attia A, Frahta N, Moussaoui A, Belhadj S (2016) An efficient fmri data clustering method using pha algorithm and dynamic time warping. Int J Comput Sci Inf Secur 14(5):222–230

    Google Scholar 

  45. Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182

    MathSciNet  Google Scholar 

  46. Lu Y, Hou X, Chen X (2016) A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing 173:3–8

    Google Scholar 

  47. Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42

    MathSciNet  MATH  Google Scholar 

  48. Duda R, Hart P, Stork D (2012) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  49. Yao AC-C (1982) On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J Comput 11(4):721–736

    MathSciNet  MATH  Google Scholar 

  50. Toussaint GT (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12(4):261–268. https://doi.org/10.1016/0031-3203(80)90066-7

    Article  MathSciNet  MATH  Google Scholar 

  51. Gabriel KR, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–278. https://doi.org/10.2307/2412323

    Article  Google Scholar 

  52. Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF (2004) Adaptive clustering ensembles. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, pp. 272–275

  53. Casa A, Scrucca L, Menardi G (2021) Better than the best? answers via model ensemble in density-based clustering. Adv Data Anal Classif 15(3):599–623

    MathSciNet  MATH  Google Scholar 

  54. Jain AK, Law MH (2005) Data clustering: A user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp. 1–10. Springer

  55. Liu D, Nosovskiy GV, Sourina O (2008) Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recogn Lett 29(9):1261–1273

    MATH  Google Scholar 

  56. Ultsch A (2005) Clustering wih som: U*c. In: Proceedings of the workshop on self-organizing maps, pp. 75–82

  57. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  58. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Google Scholar 

  59. Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86

    MATH  Google Scholar 

  60. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  61. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Google Scholar 

  62. Fred A, Jain AK (2003) Robust data clustering. In: Proceedings of the 2003 IEEE Computer society conference on computer vision and pattern recognition, pp. II–II

  63. Wilcoxon F (1992) Individual comparisons by ranking methods. In: Johnson NL (ed) Breakthroughs in statistics. Springer, New York, pp 196–202

    Google Scholar 

  64. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tülin İnkaya.

Ethics declarations

Conflict of interest

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix includes the clustering results for the real data sets in Table 4.

Table 4 Clustering results for the real data sets

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

İnkaya, T. Consensus similarity graph construction for clustering. Pattern Anal Applic 26, 703–733 (2023). https://doi.org/10.1007/s10044-022-01116-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-022-01116-w

Keywords