Skip to main content
Log in

An adaptive graph sampling framework for graph analytics

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

A Correction to this article was published on 09 February 2024

This article has been updated

Abstract

In large-scale data processing, graph analytics of complex interaction networks are indispensable. As the whole graph processing and analytics can be inefficient and usually impractical, graph sampling by keeping a portion of the original graph becomes a favorable approach. While prior work focused on fixed edge and node selection strategy based on predetermined criteria, without adaptive feedback to adjust the sampling process, this type of sampling algorithms has limited flexibility and estimation accuracy for complex graphs. In this paper, we propose an adaptive graph sampling framework, and design AdapES, an adaptive edge sampling algorithm based on this framework. Compared to non-adaptive sampling methods, our approach can continually monitor the difference between the current sampled subgraph and the original graph, and dynamically adjust the edge sampling probability based on this observed sampling difference. Guided by a preset sampling goal, this algorithm automatically adapts to the fluctuations in the random sampling process with high flexibility. The experimental evaluation in 11 datasets demonstrates that AdapES outperforms other algorithms for preserving various graph properties and statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Change history

References

  • Abu-El-Haija S, Fatemi B, Axiotis K, Bulut N, Gasteiger J, Dillon JV, Perozzi B, Bateni M (2023) Submix: learning to mix graph sampling heuristics. In: The 39th conference on uncertainty in artificial intelligence

  • Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56

    Google Scholar 

  • Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1446–1455

  • Alev VL, Lau LC (2020) Improved analysis of higher order random walks and applications. In: Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, pp 1198–1211

  • Ben-Eliezer O, Eden T, Oren J, Fotakis D (2022) Sampling multiple nodes in large networks: beyond random walks. In: Proceedings of the fifteenth ACM international conference on web search and data mining, pp 37–47

  • Bera SK, Seshadhri C (2020) How to count triangles, without seeing the whole graph. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 306–316

  • Chen X, Tan H, Chen Y, He B, Wong WF, Chen D (2021a) ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA international symposium on field-programmable gate arrays, pp 69–80

  • Chen Y, Huang S, Zhao L, Dissanayake G (2021b) Cramér-rao bounds and optimal design metrics for pose-graph slam. IEEE Trans Robot 37(2):627–641

    Article  Google Scholar 

  • Choe M, Yoo J, Lee G, Baek W, Kang U, Shin K (2022) Midas: representative sampling from real-world hypergraphs. In: Proceedings of the ACM web conference, pp 1080–1092

  • Cong W, Forsati R, Kandemir M, Mahdavi M (2020) Minimal variance sampling with provable guarantees for fast training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1393–1403

  • Fan W (2022) Big graphs: challenges and opportunities. Proc VLDB Endow 15(12):3782–3797

    Article  Google Scholar 

  • Fan W, He T, Lai L, Li X, Li Y, Li Z, Qian Z, Tian C, Wang L, Xu J et al (2021) Graphscope: a unified engine for big graph processing. Proc VLDB Endow 14(12):2879–2892

    Article  Google Scholar 

  • Gao H, Liu Y, Ji S (2021) Topology-aware graph pooling networks. IEEE Trans Pattern Anal Mach Intell 43(12):4512–4518

    Article  PubMed  Google Scholar 

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9

  • Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) \(\{\)GraphX\(\}\): graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613

  • Gove R (2019) A random sampling O (n) force-calculation algorithm for graph layouts. Computer graphics forum, vol 38. Wiley Online Library, pp 739–751

    Google Scholar 

  • Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using NetworkX. Tech. rep., Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

  • Hoang L, Dathathri R, Gill G, Pingali K (2021) Cusp: a customizable streaming edge partitioner for distributed graph analytics. ACM SIGOPS Oper Syst Rev 55(1):47–60

    Article  Google Scholar 

  • Hong SH, Lu S (2020) Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1843–1851

  • Hu Z, Zheng W, Lian X (2023) Triangular stability maximization by influence spread over social networks. Proc VLDB Endow 16(11):2818–2831

    Article  Google Scholar 

  • Imola J, Murakami T, Chaudhuri K (2022) \(\{\)Communication-Efficient\(\}\) triangle counting under local differential privacy. In: 31st USENIX security symposium (USENIX Security 22), pp 537–554

  • Jangda A, Polisetty S, Guha A, Serafini M (2021) Accelerating graph sampling for graph machine learning using GPUs. In: Proceedings of the sixteenth European conference on computer systems, pp 311–326

  • Jin T, Li B, Li Y, Zhou Q, Ma Q, Zhao Y, Chen H, Cheng J (2023) Circinus: fast redundancy-reduced subgraph matching. Proc ACM Manag Data 1(1):1–26

    Article  Google Scholar 

  • Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636

  • Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  • Li Y, Wu Z, Lin S, Xie H, Lv M, Xu Y, Lui JC (2019) Walking with perception: efficient random walk sampling via common neighbor awareness. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 962–973

  • Liu P, Benson AR, Charikar M (2019) Sampling methods for counting temporal motifs. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 294–302

  • Mariappan M, Che J, Vora K (2021) Dzig: sparsity-aware incremental processing of streaming graphs. In: Proceedings of the sixteenth European conference on computer systems, pp 83–98

  • Nakajima K, Shudo K (2022) Social graph restoration via random walk sampling. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, pp 1–14

  • Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471

  • Pandey P, Wheatman B, Xu H, Buluc A (2021) Terrace: a hierarchical graph container for skewed dynamic graphs. In: Proceedings of the 2021 international conference on management of data, pp 1372–1385

  • Preti G, De Francisci MG, Riondato M (2023) Maniacs: approximate mining of frequent subgraph patterns through sampling. ACM Trans Intell Syst Technol 14(3):1–29

    Article  Google Scholar 

  • Rozemberczki B, Kiss O, Sarkar R (2020) Little ball of fur: a python library for graph sampling. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3133–3140

  • Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29:595–618

    Article  Google Scholar 

  • Sakr S, Bonifati A, Voigt H, Iosup A, Ammar K, Angles R, Aref W, Arenas M, Besta M, Boncz PA et al (2021) The future is big graphs: a community view on graph processing systems. Commun ACM 64(9):62–71

    Article  Google Scholar 

  • Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2020) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data (TKDD) 14(2):1–39

    Article  Google Scholar 

  • Staudt CL, Sazonovs A, Meyerhenke H (2016) Networkit: a tool suite for large-scale complex network analysis. Netw Sci 4(4):508–530

    Article  Google Scholar 

  • Stumpf MP, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102(12):4221–4224

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Swift IP, Ebrahimi S, Nova A, Asudeh A (2022) Maximizing fair content spread via edge suggestion in social networks. Proc VLDB Endow 15(11):2692–2705

    Article  Google Scholar 

  • Tan Q, Zhang J, Yao J, Liu N, Zhou J, Yang H, Hu X (2021) Sparse-interest network for sequential recommendation. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 598–606

  • Tětek J, Thorup M (2022) Edge sampling and graph parameter estimation via vertex neighborhood accesses. In: Proceedings of the 54th annual ACM SIGACT symposium on theory of computing, pp 1116–1129

  • Trolliet T, Cohen N, Giroire F, Hogie L, Pérennes S (2022) Interest clustering coefficient: a new metric for directed networks like twitter. J Complex Netw 10(1):cnab030

    Article  MathSciNet  Google Scholar 

  • Van Koevering K, Benson A, Kleinberg J (2021) Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of the web conference, pp 367–378

  • Wan C, Li Y, Li A, Kim NS, Lin Y (2022) Bns-gcn: efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proc Mach Learn Syst 4:673–693

    Google Scholar 

  • Yang C, Buluç A, Owens JD (2022) Graphblast: a high-performance linear algebra-based graph framework on the gpu. ACM Trans Math Softw (TOMS) 48(1):1–51

    Article  MathSciNet  CAS  Google Scholar 

  • Yang K, Zhang M, Chen K, Ma X, Bai Y, Jiang Y (2019) Knightking: a fast distributed graph random walk engine. In: Proceedings of the 27th ACM symposium on operating systems principles, pp 524–537

  • You J, Leskovec J, He K, Xie S (2020) Graph structure of neural networks. In: International conference on machine learning. PMLR, pp 10,881–10,891

  • Zeng H, Zhou H, Srivastava A, Kannan R, Prasanna V (2020) Graphsaint: graph sampling based inductive learning method

  • Zhang Z, Liu Q, Hu Q, Lee CK (2022) Hierarchical graph transformer with adaptive node sampling. Adv Neural Inf Process Syst 35:21171–21183

    Google Scholar 

  • Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708

    Article  Google Scholar 

  • Zheng C, Zong B, Cheng W, Song D, Ni J, Yu W, Chen H, Wang W (2020) Robust graph representation learning via neural sparsification. In: International conference on machine learning. PMLR, pp 11458–11468

  • Zhu Z, Wu K, Liu Z (2023) Arya: arbitrary graph pattern mining with decomposition-based sampling. In: 20th USENIX symposium on networked systems design and implementation (NSDI 23), pp 1013–1030

Download references

Author information

Authors and Affiliations

Authors

Contributions

KW contributed to the work of conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review & editing, and visualization.

Corresponding author

Correspondence to Kewen Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

The original online version of this arcle was revised: to update the missed corrections.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K. An adaptive graph sampling framework for graph analytics. Soc. Netw. Anal. Min. 14, 4 (2024). https://doi.org/10.1007/s13278-023-01157-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-023-01157-x

Keywords

Navigation