Abstract
Graph stream is used to express complex and highly dynamic relationships between entities, such as friendships in social networks. The storage and mining on graph stream are the main research areas of big data research. Triangle listing/counting is an important topic in graph mining research. The triangle, as the simplest circle and clique structure, has many applications in many real-world scenarios. A large amount of related work also exists on the study of triangles on graph streams. However, the existing research has focused on triangle counting on static graphs or graph streams, and there is a lack of research targeting heavy weight triangle listing. This paper formally defines the triangle weight on graph stream. Based on this definition, this paper presents an approximation algorithm for the top-k heavy weight triangle listing problem on graph stream, and proposes various optimized data structures DolhaT, Filtered DolhaT and Double Filtered DolhaT (DFD) to solve this problem. Experiments on real graph stream datasets demonstrate the effectiveness of the proposed optimized structures for the heavy weight triangle listing problem on graph stream.











Similar content being viewed by others
Data availability
All experimental data generated and analysed during this study are included in this published article.
References
Guha, S., McGregor, A.: Graph synopses, sketches, and streams: A survey. Proc. VLDB Endow 5(12), 2030–2031 (2012)
Tweet statistics. http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazingtwitter-stats/10/
Email Statistics Report, 2015-2019. https://radicati.com/wp/wp-content/uploads/2015/02/Email-Statistics-Report-2015-2019-Executive-Summary.pdf
Broder, A.Z., Mitzenmacher, M.: Survey: Network applications of bloom filters: A survey. Internet Math. 1(4), 485–509 (2003)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams 3363, 398–412 (2005)
Homem, N., Carvalho, J.P.: Finding top-k elements in data streams. Inf. Sci. 180(24), 4958–4974 (2010)
Afek, Y., Bremler-Barr, A., Cohen, E., Feibish, S.L., Shagam, M.: Efficient distinct heavy hitters for DNS ddos attack detection. arXiv:1612.02636 (2016)
Basat, R.B., Chen, X., Einziger, G., Rottenstreich, O.: Designing heavy-hitter detection algorithms for programmable switches. IEEE/ACM Trans. Netw. 28(3), 1172–1185 (2020)
Newman, M.E., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proceedings of the National Academy of Sciences 99(suppl 1), 2566–2572 (2002)
Pourhabibi, T., Ong, K., Kam, B., Boo, Y.L.: Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020)
Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans. Knowl. Discov. Data 11(4), 43–14350 (2017)
Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endow. 11(12), 1876–1888 (2018)
Berry, J.W., Hendrickson, B., LaViolette, R.A., Phillips, C.A.: Tolerating the community detection resolution limit with edge weighting. Physical Review E 83(5), 056119 (2011)
Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences 99(9), 5825–5829 (2002)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4(3), 13–11328 (2010)
Chu, S., Cheng, J.: Triangle listing in massive networks and its applications, 672–680 (2011)
Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams, 685–694 (2015)
Lee, D., Shin, K., Faloutsos, C.: Temporal locality-aware sampling for accurate triangle counting in real graph streams. VLDB J. 29(6), 1501–1525 (2020)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Gemulla, R., Lehner, W., Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. VLDB J. 17(2), 173–202 (2008)
Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc. VLDB Endow. 11(2), 162–175 (2017)
Jung, M., Lim, Y., Lee, S., Kang, U.: FURL: fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Min. Knowl. Discov. 33(5), 1225–1253 (2019)
Shin, K., Oh, S., Kim, J., Hooi, B., Faloutsos, C.: Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans. Knowl. Discov. Data 14(2), 12–11239 (2020)
Ting, D.: Streamed approximate counting of distinct elements: beating optimal batch methods, 442–451 (2014)
Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.: Counting and sampling triangles from a graph stream. Proc. VLDB Endow. 6(14), 1870–1881 (2013)
Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox, 589–597 (2013)
Ahmed, N.K., Duffield, N.G., Neville, J., Kompella, R.R.: Graph sample and hold: a framework for big-graph analytics, 1446–1455 (2014)
Yang, T., Zhang, H., Yang, D., Huang, Y., Li, X.: Finding significant items in data streams, 1394–1405 (2019)
Kumar, V., Sinha, D.: A robust intelligent zero-day cyber-attack detection technique. Complex & Intelligent Systems 7(5), 2211–2234 (2021)
Choudhury, S., Holder, L.B., Jr., G.C., Agarwal, K., Feo, J.: A selectivity based approach to continuous pattern detection in streaming graphs, 157–168 (2015)
Li, Y., Zou, L., Özsu, M.T., Zhao, D.: Time constrained continuous subgraph search over streaming graphs, 1082–1093 (2019)
Kong, Y.-X., Shi, G.-Y., Wu, R.-J., Zhang, Y.-C.: k-core: Theories and applications. Physics Reports 832, 1–32 (2019)
Zhang, F., Zou, L., Zeng, L., Gou, X.: Dolha - an efficient and exact data structure for streaming graphs. World Wide Web 23(2), 873–903 (2020)
Li, J., Li, Z., Xu, Y., Jiang, S., Yang, T., Cui, B., Dai, Y., Zhang, G.: Wavingsketch: An unbiased and generic sketch for finding top-k items in data streams. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1574–1584 (2020)
Fan, Z., Hu, Z., Wu, Y., Guo, J., Liu, W., Yang, T., Wang, H., Xu, Y., Uhlig, S., Tu, Y.: Pisketch: finding persistent and infrequent flows. In: Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures, pp. 8–14 (2022)
Song, C., Liu, X., Ge, T., Ge, Y.: Top-k frequent items and item frequency tracking over sliding windows of any size. Information Sciences 475, 100–120 (2019)
Ben-Basat, R., Einziger, G., Friedman, R., Kassner, Y.: Heavy hitters in streams and sliding windows. In: IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016). IEEE
Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study 3503, 606–609 (2005)
Gall, F.L.: Improved quantum algorithm for triangle finding via combinatorial arguments, 216–225 (2014)
Vassilevska, V., Williams, R.: Finding a maximum weight triangle in \(o(n^{3-\delta })\) time, with applications, 225–231 (2006)
Czumaj, A., Lingas, A.: Finding a heaviest vertex-weighted triangle is not harder than matrix multiplication. SIAM J. Comput. 39(2), 431–444 (2009)
Patrascu, M.: Towards polynomial lower bounds for dynamic problems, 603–610 (2010)
Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems, 645–654 (2010)
Williams, V.V., Williams, R.: Finding, minimizing, and counting weighted subgraphs. SIAM J. Comput. 42(3), 831–854 (2013)
Williams, R.R.: Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput. 47(5), 1965–1985 (2018)
Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/. Accessed in 2009
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization, 4292–4293 (2015)
Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network, 25–30 (2008)
Richardson, M., Agrawal, R., Domingos, P.M.: Trust management for the semantic web 2870, 351–368 (2003)
Massa, P., Avesani, P.: Controversial users demand local trust metrics: An experimental study on epinions.com community, 121–126 (2005)
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Fan Zhang was a major contributor in the problem definition and the algorithm design and implementation as well as the manuscript writing. Xiangyang Gou revised the paper writing. Lei Zou guided the whole project and revised the paper writing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical approval and Consent to participate
Not applicable.
Consent for publication
Not applicable.
Human and animal ethics
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, F., Gou, X. & Zou, L. Top-k heavy weight triangles listing on graph stream. World Wide Web 26, 1827–1851 (2023). https://doi.org/10.1007/s11280-022-01117-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01117-z