Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Danesh, Malihe; Dorrigiv, Morteza; Yaghmaee, Farzin

doi:10.1007/s11227-020-03429-1

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Published: 14 September 2020

Volume 77, pages 4107–4134, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

344 Accesses
6 Citations
Explore all metrics

Abstract

Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Graph based anomaly detection and description: a survey

Article 05 July 2014

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Notes

References

Zou Z, Li J, Gao H et al (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22:1203–1218
Article Google Scholar
Papapetrou O, Ioannou E, Skoutas D (2011) Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: EDBT/ICDT’11, pp 355–366
Potamias M, Bonchi F, Gionis A et al (2010) k-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1):997–1008
Article Google Scholar
Strehl A, Ghosh J (2003) Cluster ensembles—A knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Article Google Scholar
Li F, Qian Y, Wang J et al (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55
Article MathSciNet Google Scholar
Boongoen T, Iam-On N (2018) Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev 28:1–25
Article MathSciNet Google Scholar
Alqurashi T, Wang W (2019) Clustering ensemble method. Int J Mach Learn Cyb 10:1227–1246
Article Google Scholar
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372
Article MathSciNet Google Scholar
Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336
Article Google Scholar
Ailon N, Charikar M, Newman A (2005) Aggregating Inconsistent Information: Ranking and Clustering. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp 684–693
Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95
Article Google Scholar
Gu Y, Gao C, Cong G et al (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130
Article Google Scholar
Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. Proc VLDB Endowment 11(4):472–544
Article Google Scholar
Halim Z, Khattak JH (2019) Density-based clustering of big probabilistic graphs. Evolv Syst 10(3):333–350
Article Google Scholar
Qiu YX, Li RH, Li J, Qiao S et al (2018) Efficient structural clustering on probabilistic graphs. IEEE Trans Knowl Data Eng 31(10):1954–1968
Article Google Scholar
Iam-On N, Boongoen T (2015) Comparative study of matrix refinement approaches for ensemble clustering. Mach Learn 98(1–2):269–300
Article MathSciNet Google Scholar
Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE TKDE 32(6):1212–1226
Google Scholar
Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Article Google Scholar
Yi J, Yang T, Jin R et al (2012) Robust ensemble clustering by matrix completion. In: Proceedings of IEEE International Conference on Data Mining (ICDM)
Fred AN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Lourenço A, Bulò SR, Rebagliati N et al (2015) Probabilistic consensus clustering using evidence accumulation. Mach Learn 98(1–2):331–357
Article MathSciNet Google Scholar
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning (ICML)
Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326
Article Google Scholar
Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Article Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Article MathSciNet Google Scholar
Huang D, Lai J, Wang CD (2016) Ensemble clustering using factor graph. Pattern Recogn 50:131–142
Article Google Scholar
Franek L, Jiang X (2014) Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn 47(2):833–842
Article Google Scholar
Weiszfeld E, Plastria F (2009) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167(1):7–41
Article MathSciNet Google Scholar
Benjelloun O, Sarma AD, Halevy A et al (2006) ULDBs: databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp 953–964
Dalvi NN, Suciu D (2004) Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Databases, Toronto, Canada.
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):1–30
Article Google Scholar
Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. Proc VLDB Endow 12(6):667–680
Article Google Scholar
Shamir R, Sharan R, Tsur D (2004) Cluster graph modification problems. Discrete Appl Math 144(1–2):173–182
Article MathSciNet Google Scholar
Bian W, Tao D (2011) Learning a distance metric by empirical loss minimization. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp 1186–1191, Barcelona, Spain
Luo Y, Wen Y, Tao D (2016) On combining side information and unlabeled data for heterogeneous multi-task metric learning. In: The 25th International Joint Conference on Artificial Intelligence, pp 1809–1815, New York
Xiang S, Nie F, Zhang C (2008) Learning a mahalanobis distance metric for data clustering and classification. Pattern Recogn 41(12):3600–3612
Article Google Scholar
Xing EP, Ng AY, Jordan MI et al (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp 521–528. Cambridge
Law MT, Yu Y, Cord M et al (2016) Closed-form training of mahalanobis distance for supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3909–3917, Las Vegas, NV
McFee B, Lanckriet GR (2010) Metric learning to rank. In: Proceedings of the 27th International Conference on Machine Learning, pp 775–782, Haifa, Israel.
Mahalanobis PC (1936) On the generalized distance in statistics. In: Proceedings of the National Institute of Science, Calcutta, India
Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
MATH Google Scholar
Kulis B (2012) Metric learning: a survey. Found Trends Mach Learn 5(4):287–364
Article MathSciNet Google Scholar
Krogan NJ et al (2006) Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084):637–643
Article Google Scholar
Wu X, Ma T, Cao J et al (2018) A comparative study of clustering ensemble algorithms. Comput & Electr Eng 68:603–615
Article Google Scholar
Leutbecher M (2018) Ensemble size: how suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128
Article Google Scholar
Buizza R, Palmer TN (1998) Impact of ensemble size on ensemble prediction. Mon Weather Rev 126:2503–2518
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical and Computer Engineering, Semnan University, Semnan, Iran
Malihe Danesh, Morteza Dorrigiv & Farzin Yaghmaee

Authors

Malihe Danesh
View author publications
You can also search for this author in PubMed Google Scholar
Morteza Dorrigiv
View author publications
You can also search for this author in PubMed Google Scholar
Farzin Yaghmaee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Morteza Dorrigiv.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Danesh, M., Dorrigiv, M. & Yaghmaee, F. Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning. J Supercomput 77, 4107–4134 (2021). https://doi.org/10.1007/s11227-020-03429-1

Download citation

Accepted: 03 September 2020
Published: 14 September 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11227-020-03429-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Graph based anomaly detection and description: a survey

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Graph based anomaly detection and description: a survey

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation