Skip to main content

Advertisement

Log in

Triangle-induced and degree-wise sampling over large graphs in social networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Social networks are crucial channels for information dissemination because they facilitate the effective exchange of ideas and information. The extensive utilization of these networks in daily life results in their explosive growth. Most methods are not practical for large-scale network analysis. Hence, analyzing a small portion instead of the whole network could be preferable, a technique known as graph sampling. According to a literature review, many approaches to graph sampling focus only on representing connectivity from triangle count assessment or node degree evaluation rather than both. This paper introduces an innovative method called TDGS (triangle-induced and degree-wise graph sampling) for producing a sample under the impact of both triangle count and node degree in social networks. The key idea behind TDGS is that it uses a centrality measure based on the degree centrality and local information of nodes about the connectivity of their neighbors to guide the sampling process. Furthermore, TDGS proposes a distributed model that can handle large-scale graphs. We evaluate the performance of our proposed method using real-world social networks. The experimental results show that TDGS provides significantly more precise information about node degrees than the well-known graph sampling methods and can estimate the global clustering coefficient with fewer estimation errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94

    Article  Google Scholar 

  2. Bo D, Wang X, Liu Y, Fang Y, Li Y, Shi C (2023) A survey on spectral graph neural networks. https://arxiv.org/abs/2302.05631

  3. Gao X, Yu J, Jiang W, Chen T, Zhang W, Yin H (2024) Graph condensation: a survey. https://arxiv.org/abs/2401.11720

  4. Wu S, Sun F, Zhang W, Xie X, Cui B (2022) Graph neural networks in recommender systems: a survey. ACM Comput Surv 55(5):1–37

    Article  Google Scholar 

  5. Theocharidis A, Van Dongen S, Enright AJ, Freeman TC (2009) Network visualization and analysis of gene expression data using BioLayout Express 3D. Nat Protoc 4(10):1535–1550

    Article  Google Scholar 

  6. Cancho RFI, Solé RV (2001) The small world of human language. Proc R Soc Lond Ser B Biol Sci 268(1482):2261–2265

    Article  Google Scholar 

  7. Rhouma D, Romdhane LB (2014) An efficient algorithm for community mining with overlap in social networks. Expert Syst Appl 41(9):4309–4321

    Article  Google Scholar 

  8. Guo Z et al. (2022) Graph-based molecular representation learning. https://arxiv.org/abs/2207.04869

  9. Rahmani S, Baghbani A, Bouguila N, Patterson Z (2023) Graph neural networks for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 24:8846–8885

    Article  Google Scholar 

  10. Yu J, Yin H, Xia X, Chen T, Li J, Huang Z (2023) Self-supervised learning for recommender systems: a survey. IEEE Trans Knowl Data Eng 24:335–355

    Google Scholar 

  11. Zheng R, Qu L, Cui B, Shi Y, Yin H (2023) Automl for deep recommender systems: a survey. ACM Trans Inf Syst 41(4):1–38

    Article  Google Scholar 

  12. Wang W et al (2024) Epidemic spreading on higher-order networks. Phys Rep 1056:1–70

    Article  MathSciNet  Google Scholar 

  13. Aridhi S, Nguifo EM (2016) Big graph mining: frameworks and techniques. Big Data Res 6:1–10

    Article  Google Scholar 

  14. Alekseev VE, Boliac R, Korobitsyn DV, Lozin VV (2007) NP-hard graph problems and boundary classes of graphs. Theoret Comput Sci 389(1–2):219–236

    Article  MathSciNet  Google Scholar 

  15. Artime O et al (2024) Robustness and resilience of complex networks. Nat Rev Phys 6:1–18

    Article  Google Scholar 

  16. Ji P et al (2023) Signal propagation in complex networks. Phys Rep 1017:1–96

    Article  MathSciNet  Google Scholar 

  17. Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufmann 10:559–569

    Google Scholar 

  18. Kherif F, Latypova A (2020) Principal component analysis. In: Machine learning. Elsevier, pp 209–225

  19. Guo L, Dai Q (2022) Graph clustering via variational graph embedding. Pattern Recogn 122:108334

    Article  Google Scholar 

  20. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636

  21. Zhu J, Li H, Chen M, Dai Z, Zhu M (2018) Enhancing stratified graph sampling algorithms based on approximate degree distribution. In: Computer science on-line conference. Springer, pp 197–207

  22. Hu P, Lau WC (2013) A survey and taxonomy of graph sampling. https://arxiv.org/abs/1308.5865

  23. Ruan Y, Fuhry D, Liang J, Wang Y, Parthasarathy S (2015) Community discovery: simple and scalable approaches. In: User community discovery. Springer, pp 23–54

  24. Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis EJ (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 120–129

  25. Li R-H, Yu JX, Qin L, Mao R, Jin T (2015) On random walk based graph sampling. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 927–938

  26. Yousuf MI, Kim S (2020) Guided sampling for large graphs. Data Min Knowl Disc 34(4):905–948

    Article  MathSciNet  Google Scholar 

  27. Jaouadi M, Romdhane LB (2021) A distributed model for sampling large scale social networks. Expert Syst Appl 186:115773

    Article  Google Scholar 

  28. Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9

  29. Cai G, Lu G, Guo J, Ling C, Li R (2020) Fast representative sampling in large-scale online social networks. IEEE Access 8:77106–77119

    Article  Google Scholar 

  30. Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703

    Article  MathSciNet  Google Scholar 

  31. Holme P (2019) Rare and everywhere: Perspectives on scale-free networks. Nat Commun 10(1):1–3

    Article  Google Scholar 

  32. Zhou Z et al (2020) Context-aware sampling of large networks via graph representation learning. IEEE Trans Visual Comput Graphics 27(2):1709–1719

    Article  Google Scholar 

  33. Jaouadi M, Romdhane LB (2022) Distributed sampling of social networks: a new approach based on node’s importance. Procedia Comput Sci 207:2508–2517

    Article  Google Scholar 

  34. Ahmed N, Neville J, Kompella RR (2011) Network sampling via edge-based node selection with graph induction. https://docs.lib.purdue.edu/cstech/1747

  35. Batjargal D, Khan KU, Lee Y-K (2019) EM-FGS: graph sparsification via faster semi-metric edges pruning. Appl Intell 49(10):3731–3748

    Article  Google Scholar 

  36. Le CM (2021) Edge sampling using local network information. J Mach Learn Res 22(88):1–29

    MathSciNet  Google Scholar 

  37. Ghaljaie F, Naderifar M, Goli H (2017) Snowball sampling: a purposeful method of sampling in qualitative research. Strides Dev Med Educ 14(3):1–6

    Google Scholar 

  38. Zhang L, Jiang H, Wang F, Feng D (2020) DRaWS: a dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks. Knowl-Based Syst 198:105891

    Article  Google Scholar 

  39. Yao X, Shao Y, Cui B, Chen L (2021) Uninet: scalable network representation learning with metropolis-hastings sampling. In: 2021 IEEE 37th international conference on data engineering (ICDE). IEEE, pp 516–527

  40. Salamanos N, Voudigari E, Yannakoudakis EJ (2017) Deterministic graph exploration for efficient graph sampling. Soc Netw Anal Min 7(1):1–14

    Article  Google Scholar 

  41. Zhang J, Chen H, Yu D, Pei Y, Deng Y (2023) Cluster-preserving sampling algorithm for large-scale graphs. SCIENCE CHINA Inf Sci 66(1):112103

    Article  MathSciNet  Google Scholar 

  42. Rhouma D, Romdhane LB (2018) An efficient multilevel scheme for coarsening large scale social networks. Appl Intell 48(10):3557–3576

    Article  Google Scholar 

  43. Metcalf L, Casey W (2016) Cybersecurity and applied mathematics. Syngress

    Google Scholar 

  44. Arifuzzaman S, Khan M, Marathe M (2019) Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans Knowl Discov Data (TKDD) 14(1):1–34

    Google Scholar 

  45. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U (2006) Complex networks: structure and dynamics. Phys Rep 424(4–5):175–308

    Article  MathSciNet  Google Scholar 

  46. Newman M (2018) Networks. Oxford University Press

    Book  Google Scholar 

  47. Curado M, Rodriguez R, Terroso-Saenz F, Tortosa L, Vicent JF (2022) A centrality model for directed graphs based on the two-way-random path and associated indices for characterizing the nodes. J Comput Sci 63:101819

    Article  Google Scholar 

  48. De Meo P, Levene M, Messina F, Provetti A (2019) A general centrality framework-based on node navigability. IEEE Trans Knowl Data Eng 32(11):2088–2100

    Article  Google Scholar 

  49. Yu P-D, Tan CW, Fu H-L (2022) Epidemic source detection in contact tracing networks: epidemic centrality in graphs and message-passing algorithms. IEEE J Sel Top Signal Process 16(2):234–249

    Article  Google Scholar 

  50. Leskovec J, Krevl A (2014) SNAP Datasets. Available: http://snap.stanford.edu/data

  51. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discovery Data (TKDD) 1(1):2

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Dr. Elaheh Gavagsaz wrote the main manuscript text and Prof. Alireza Souri prepared investigation on the results and edited the main text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Alireza Souri.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gavagsaz, E., Souri, A. Triangle-induced and degree-wise sampling over large graphs in social networks. J Supercomput 81, 145 (2025). https://doi.org/10.1007/s11227-024-06613-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06613-9

Keywords