Abstract
Social networks are crucial channels for information dissemination because they facilitate the effective exchange of ideas and information. The extensive utilization of these networks in daily life results in their explosive growth. Most methods are not practical for large-scale network analysis. Hence, analyzing a small portion instead of the whole network could be preferable, a technique known as graph sampling. According to a literature review, many approaches to graph sampling focus only on representing connectivity from triangle count assessment or node degree evaluation rather than both. This paper introduces an innovative method called TDGS (triangle-induced and degree-wise graph sampling) for producing a sample under the impact of both triangle count and node degree in social networks. The key idea behind TDGS is that it uses a centrality measure based on the degree centrality and local information of nodes about the connectivity of their neighbors to guide the sampling process. Furthermore, TDGS proposes a distributed model that can handle large-scale graphs. We evaluate the performance of our proposed method using real-world social networks. The experimental results show that TDGS provides significantly more precise information about node degrees than the well-known graph sampling methods and can estimate the global clustering coefficient with fewer estimation errors.















Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
References
Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94
Bo D, Wang X, Liu Y, Fang Y, Li Y, Shi C (2023) A survey on spectral graph neural networks. https://arxiv.org/abs/2302.05631
Gao X, Yu J, Jiang W, Chen T, Zhang W, Yin H (2024) Graph condensation: a survey. https://arxiv.org/abs/2401.11720
Wu S, Sun F, Zhang W, Xie X, Cui B (2022) Graph neural networks in recommender systems: a survey. ACM Comput Surv 55(5):1–37
Theocharidis A, Van Dongen S, Enright AJ, Freeman TC (2009) Network visualization and analysis of gene expression data using BioLayout Express 3D. Nat Protoc 4(10):1535–1550
Cancho RFI, Solé RV (2001) The small world of human language. Proc R Soc Lond Ser B Biol Sci 268(1482):2261–2265
Rhouma D, Romdhane LB (2014) An efficient algorithm for community mining with overlap in social networks. Expert Syst Appl 41(9):4309–4321
Guo Z et al. (2022) Graph-based molecular representation learning. https://arxiv.org/abs/2207.04869
Rahmani S, Baghbani A, Bouguila N, Patterson Z (2023) Graph neural networks for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 24:8846–8885
Yu J, Yin H, Xia X, Chen T, Li J, Huang Z (2023) Self-supervised learning for recommender systems: a survey. IEEE Trans Knowl Data Eng 24:335–355
Zheng R, Qu L, Cui B, Shi Y, Yin H (2023) Automl for deep recommender systems: a survey. ACM Trans Inf Syst 41(4):1–38
Wang W et al (2024) Epidemic spreading on higher-order networks. Phys Rep 1056:1–70
Aridhi S, Nguifo EM (2016) Big graph mining: frameworks and techniques. Big Data Res 6:1–10
Alekseev VE, Boliac R, Korobitsyn DV, Lozin VV (2007) NP-hard graph problems and boundary classes of graphs. Theoret Comput Sci 389(1–2):219–236
Artime O et al (2024) Robustness and resilience of complex networks. Nat Rev Phys 6:1–18
Ji P et al (2023) Signal propagation in complex networks. Phys Rep 1017:1–96
Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufmann 10:559–569
Kherif F, Latypova A (2020) Principal component analysis. In: Machine learning. Elsevier, pp 209–225
Guo L, Dai Q (2022) Graph clustering via variational graph embedding. Pattern Recogn 122:108334
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636
Zhu J, Li H, Chen M, Dai Z, Zhu M (2018) Enhancing stratified graph sampling algorithms based on approximate degree distribution. In: Computer science on-line conference. Springer, pp 197–207
Hu P, Lau WC (2013) A survey and taxonomy of graph sampling. https://arxiv.org/abs/1308.5865
Ruan Y, Fuhry D, Liang J, Wang Y, Parthasarathy S (2015) Community discovery: simple and scalable approaches. In: User community discovery. Springer, pp 23–54
Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis EJ (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 120–129
Li R-H, Yu JX, Qin L, Mao R, Jin T (2015) On random walk based graph sampling. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 927–938
Yousuf MI, Kim S (2020) Guided sampling for large graphs. Data Min Knowl Disc 34(4):905–948
Jaouadi M, Romdhane LB (2021) A distributed model for sampling large scale social networks. Expert Syst Appl 186:115773
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9
Cai G, Lu G, Guo J, Ling C, Li R (2020) Fast representative sampling in large-scale online social networks. IEEE Access 8:77106–77119
Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703
Holme P (2019) Rare and everywhere: Perspectives on scale-free networks. Nat Commun 10(1):1–3
Zhou Z et al (2020) Context-aware sampling of large networks via graph representation learning. IEEE Trans Visual Comput Graphics 27(2):1709–1719
Jaouadi M, Romdhane LB (2022) Distributed sampling of social networks: a new approach based on node’s importance. Procedia Comput Sci 207:2508–2517
Ahmed N, Neville J, Kompella RR (2011) Network sampling via edge-based node selection with graph induction. https://docs.lib.purdue.edu/cstech/1747
Batjargal D, Khan KU, Lee Y-K (2019) EM-FGS: graph sparsification via faster semi-metric edges pruning. Appl Intell 49(10):3731–3748
Le CM (2021) Edge sampling using local network information. J Mach Learn Res 22(88):1–29
Ghaljaie F, Naderifar M, Goli H (2017) Snowball sampling: a purposeful method of sampling in qualitative research. Strides Dev Med Educ 14(3):1–6
Zhang L, Jiang H, Wang F, Feng D (2020) DRaWS: a dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks. Knowl-Based Syst 198:105891
Yao X, Shao Y, Cui B, Chen L (2021) Uninet: scalable network representation learning with metropolis-hastings sampling. In: 2021 IEEE 37th international conference on data engineering (ICDE). IEEE, pp 516–527
Salamanos N, Voudigari E, Yannakoudakis EJ (2017) Deterministic graph exploration for efficient graph sampling. Soc Netw Anal Min 7(1):1–14
Zhang J, Chen H, Yu D, Pei Y, Deng Y (2023) Cluster-preserving sampling algorithm for large-scale graphs. SCIENCE CHINA Inf Sci 66(1):112103
Rhouma D, Romdhane LB (2018) An efficient multilevel scheme for coarsening large scale social networks. Appl Intell 48(10):3557–3576
Metcalf L, Casey W (2016) Cybersecurity and applied mathematics. Syngress
Arifuzzaman S, Khan M, Marathe M (2019) Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans Knowl Discov Data (TKDD) 14(1):1–34
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U (2006) Complex networks: structure and dynamics. Phys Rep 424(4–5):175–308
Newman M (2018) Networks. Oxford University Press
Curado M, Rodriguez R, Terroso-Saenz F, Tortosa L, Vicent JF (2022) A centrality model for directed graphs based on the two-way-random path and associated indices for characterizing the nodes. J Comput Sci 63:101819
De Meo P, Levene M, Messina F, Provetti A (2019) A general centrality framework-based on node navigability. IEEE Trans Knowl Data Eng 32(11):2088–2100
Yu P-D, Tan CW, Fu H-L (2022) Epidemic source detection in contact tracing networks: epidemic centrality in graphs and message-passing algorithms. IEEE J Sel Top Signal Process 16(2):234–249
Leskovec J, Krevl A (2014) SNAP Datasets. Available: http://snap.stanford.edu/data
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discovery Data (TKDD) 1(1):2
Author information
Authors and Affiliations
Contributions
Dr. Elaheh Gavagsaz wrote the main manuscript text and Prof. Alireza Souri prepared investigation on the results and edited the main text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gavagsaz, E., Souri, A. Triangle-induced and degree-wise sampling over large graphs in social networks. J Supercomput 81, 145 (2025). https://doi.org/10.1007/s11227-024-06613-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06613-9