Skip to main content
Log in

Persistent graph stream summarization for real-time graph analytics

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In massive and rapid graph streams, a useful and important task is to summarize the structure of graph streams in order to enable efficient and effective graph query processing. Although this task has been extensively studied in the literature, we observe that the existing solutions for graph sketches can only answer queries about the current status of the graph stream. In this paper, we aim at designing persistent graph sketches to support graph queries in any given time range in the past. To this end, we first introduce a baseline method by extending an existing graph summarization method. However, our empirical study suggests that the accuracy performance of the baseline method is unsatisfactory, especially when the query time interval is large. To tackle this issue, we propose a new method PGSS-BDH, which stores the streaming edges using a set of hierarchically organized hashmaps. When a query arrives, we divide the query time interval into a set of disjoint sub-intervals and return the sum of query results on all sub-intervals as the overall query answer. Observing that PGSS-BDH bears a linear space cost to the graph stream size, we further propose an advance method PGSS-MDC by using a set of fixed-size hierarchical counters to store the weight of edges, where the query processing is similar to PGSS-BDH. We theoretically analyze the accuracy performance of PGSS-BDH and PGSS-MDC. The experiment results on real-life datasets demonstrate that PGSS-MDC can return much more accurate answers than the competitors by consuming comparable query time and much less memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availibility

The datasets used and executable code are available in https://www.github.com/Yishiliucaihua/pgss/.

Notes

  1. http://www.uschinahpa.org/2020/03/wechat-tencent-annual-report/

References

  1. Adhikari, B., Zhang, Y., Bharadwaj, A., Prakash, B.A.: Condensing temporal networks using propagation. In: SDM. (2017)

  2. Ahn, K., Guha, S., Mcgregor, A.: Graph sketches sparsification, spanners, and subgraphs. In: PODS ’12. (2012)

  3. Charikar, M., Chen, K.C., Farach-Colton, M.: Finding frequent items in data streams. In: Theor. Comput. Sci. (2002)

  4. Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. Proc. VLDB Endow. 1, 213–224 (2008)

    Article  Google Scholar 

  5. Cormode, G., Muthukrishnan, S.: An improved data stream summary the count-min sketch and its applications. In: J. Algorithms. (2005)

  6. Cormode, G., Muthukrishnan, S.: Space efficient mining of multigraph streams. In: PODS ’05. (2005)

  7. Driscoll, J.R., Sarnak, N., Sleator, D.D., Tarjan, R.E.: Making data structures persistent. J. Comput. Syst. Sci. 38, 86–124 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fan, W., Li, Y., Liu, M., Lu, C.: Making graphs compact by lossless contraction. The Vldb Journal. 32, 49–73 (2021)

    Article  Google Scholar 

  9. Fan, W., Li, Y., Liu, M., Lu, C.: A hierarchical contraction scheme for querying big graphs. Proceedings of the 2022 International Conference on Management of Data (2022)

  10. Gou, X., Zou, L., Zhao, C., Yang, T.: Fast and accurate graph stream summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). pp. 1118–1129 (2019)

  11. Guha, S., Mcgregor, A.: Graph synopses, sketches, and streams a survey. Proc. VLDB Endow. 5, 2030–2031 (2012)

  12. Hajiabadi, M., Srinivasan, V., Thomo, A.: Dynamic graph summarization optimal and scalable. 2022 IEEE International Conference on Big Data (Big Data). pp. 545–554 (2022)

  13. Ji, Y., Zhang, Z.C., Tang, X., Shen, J., Zhang, X., Yang, G.Y.: Detecting cash-out users via dense subgraphs. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)

  14. Khan, A., Aggarwal, C.C.: Query-friendly compression of graph streams. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). pp. 130–137 (2016)

  15. Ko, J., Kook, Y., Shin, K.: Incremental lossless graph summarization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020)

  16. Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: KDD ’06 (2006)

  17. Kunegis, J.: Konect the koblenz network collection. In: Proceedings of the 22nd international conference on World Wide Web. pp. 1343–1350 (2013)

  18. Lee, K., Jo, H., Ko, J., Lim, S., Shin, K.: Ssumm sparse summarization of massive graphs. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020)

  19. Ma, Z., Liu, Y., Hu, Y., Yang, J., Liu, C., Dai, H.: Efficient maintenance for maximal bicliques in bipartite graph streams. World Wide Web. 25, 857–877 (2021)

    Article  Google Scholar 

  20. Ma, Z., Liu, Y., Yang, Z., Yang, J., Li, K.: A parameter-free approach tolossless summarization of fully dynamic graphs. Inf. Sci. 589, 376–394 (2022)

    Article  Google Scholar 

  21. Ma, Z., Yang, J., Li, K., Liu, Y., Zhou, X., Hu, Y.: A parameter-free approach for lossless streaming graph summarization. In: DASFAA. (2021)

  22. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. Proc. VLDB Endow. 5, 1699 (2002)

    Article  Google Scholar 

  23. Ouyang, D., Yuan, L., Qin, L., Chang, L., Zhang, Y., Lin, X.: Efficient shortest path index maintenance on dynamic road networks with theoretical guarantees. Proceedings of the VLDB Endowment. 13, 602–615 (2020)

    Article  Google Scholar 

  24. Paranjape, A., Benson, A.R., Leskovec, J.: Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining. pp. 601–610 (2017)

  25. Peng, Y., Guo, J., Li, F., Qian, W., Zhou, A.: Persistent bloom filter membership testing for the entire history. Proceedings of the 2018 International Conference on Management of Data (2018)

  26. Qu, Q., Liu, S., Zhu, F., Jensen, C.S.: Efficient online summarization of large-scale dynamic networks. IEEE Transactions on Knowledge and Data Engineering. 28, 3231–3245 (2016)

    Article  Google Scholar 

  27. Shah, N., Koutra, D., Zou, T., Gallagher, B., Faloutsos, C.: Timecrunch interpretable dynamic graph summarization. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)

  28. Sun, S., Sun, X., He, B., Luo, Q.: Rapidflow: An efficient approach to continuous subgraph matching. Proc. VLDB Endow. 15, 2415–2427 (2022)

    Article  Google Scholar 

  29. Tang, N., Chen, Q., Mitra, P.: Graph stream summarization from big bang to big crunch. Proceedings of the 2016 International Conference on Management of Data (2016)

  30. Wang, S., Terano, T.: Detecting rumor patterns in streaming social media. 2015 IEEE International Conference on Big Data (Big Data). pp. 2709–2715 (2015)

  31. Wei, Z., Luo, G., Yi, K., Du, X., Wen, J.R.: Persistent data sketching. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (2015)

  32. Yong, Q., Hajiabadi, M., Srinivasan, V., Thomo, A.: Efficient graph summarization using weighted lsh at billion-scale. Proceedings of the 2021 International Conference on Management of Data (2021)

  33. Zhang, L., Gao, M., Qian, W., Zhou, A.: Compressing streaming graph data based on triangulation. In: APWeb Workshops (2016)

  34. Zhao, P., Aggarwal, C.C., Wang, M.: gsketch: On query estimation in graph streams. Proc. VLDB Endow. 5, 193–204 (2012)

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the Major Key Project of PCL (PCL2022A03), National Natural Science Foundation of China (62002108), and Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005).

Author information

Authors and Affiliations

Authors

Contributions

Yan Jia and Zhaoquan Gu wrote the main manuscript text. Yan Jia proposed the main technical ideas for the methods in the manuscript. Zhihao Jiang and Cuiyun Gao implemented the source code and conducted the experiments. Jianye Yang prepared all figures in the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhaoquan Gu.

Ethics declarations

Ethical Approval

Not applicable

Competing interests

The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, Y., Gu, Z., Jiang, Z. et al. Persistent graph stream summarization for real-time graph analytics. World Wide Web 26, 2647–2667 (2023). https://doi.org/10.1007/s11280-023-01165-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-023-01165-z

Keywords

Navigation