skip to main content
research-article

Distributed Triangle Approximately Counting Algorithms in Simple Graph Stream

Published: 08 January 2022 Publication History

Abstract

Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, the problem of counting global and local triangles in a graph stream has been widely studied, and numerous triangle counting steaming algorithms have emerged. To improve the throughput and scalability of streaming algorithms, many researches of distributed streaming algorithms on multiple machines are studied. In this article, we first propose a framework of distributed streaming algorithm based on the Master-Worker-Aggregator architecture. The two core parts of this framework are an edge distribution strategy, which plays a key role to affect the performance, including the communication overhead and workload balance, and aggregation method, which is critical to obtain the unbiased estimations of the global and local triangle counts in a graph stream. Then, we extend the state-of-the-art centralized algorithm TRIÈST into four distributed algorithms under our framework. Compared to their competitors, experimental results show that DVHT-i is excellent in accuracy and speed, performing better than the best existing distributed streaming algorithm. DEHT-b is the fastest algorithm and has the least communication overhead. What’s more, it almost achieves absolute workload balance.

References

[1]
Nesreen K. Ahmed, Nick G. Duffield, Jennifer Neville, and Ramana Rao Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceeding of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, 1446–1455.
[2]
Nesreen K. Ahmed, Nick G. Duffield, Theodore L. Willke, and Ryan A. Rossi. 2017. On sampling from massive graph streams. Proceedings of the VLDB Endowment 10, 11 (2017), 1430–1441.
[3]
Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM, 529–538.
[4]
Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 623–632.
[5]
Vladimir Batagelj and Matjaz Zaversnik. 2007. Short cycle connectivity. Discrete Mathematics 307, 3-5 (2007), 310–318.
[6]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. ACM Transactions on Knowledge Discovery from Data 4, 3 (2010), 13:1–13:28.
[7]
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’06. ACM, 253–262.
[8]
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’06). ACM, 253–262.
[9]
Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences 99, 9 (2002), 5825–5829.
[10]
Madhav Jha, Comandur Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (kDD’13). ACM, 589–597.
[11]
Joyce Jiyoung Whang, Yeonsung Jung, Seonggoo Kang, Dongho Yoo, and Inderjit S. Dhillon. 2020. Scalable anti-trustrank with qualified site-level seeds for link-based web spam detection. In Companion Proceedings of the Web Conference. 593–602.
[12]
Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th Annual International Conference on Computing and Combinatorics: (COCOON’05). Springer, 710–716.
[13]
Minsoo Jung, Yongsub Lim, Sunmin Lee, and U. Kang. 2019. FURL: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Mining and Knowledge Discovery 33, 5 (2019), 1225–1253.
[14]
Konstantin Kutzkov and Rasmus Pagh. 2013. On the streaming complexity of computing local clustering coefficients. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’ 13). ACM, 677–686.
[15]
Yongsub Lim, Minsoo Jung, and U. Kang. 2018. Memory-efficient and accurate sampling for counting local triangles in graph streams: From simple to multigraphs. ACM Transactions on Knowledge Discovery from Data 12, 1 (2018), 4:1–4:28.
[16]
Boge Liu, Fan Zhang, Wenjie Zhang, Xuemin Lin, and Ying Zhang. 2021. Efficient community search with size constraint. In Proceedings of the 37th IEEE International Conference on Data Engineering. 97–108.
[17]
Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. 2016. Better algorithms for counting triangles in data streams. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’16. ACM, 401–411.
[18]
Amin Emamzadeh Esmaeili Nejad, Mansoor Zolghadri Jahromi, and Mohammad Taheri. 2021. Graph compression based on transitivity for neighborhood query. Information Sciences 576 (2021), 312–328. https://doi.org/10.1016/j.ins.2021.06.050
[19]
Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Review 45, 2 (2003), 167–256.
[20]
Ha-Myung Park and Chin-Wan Chung. 2013. An efficient mapreduce algorithm for counting triangles in a very large graph. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM, 539–548.
[21]
A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1870–1881.
[22]
Yuri Pritykin and Mona Singh. 2013. Simple topological features reflect dynamics and modularity in protein interaction networks. PLoS Computational Biology 9, 10 (2013), e1003243.
[23]
Dharavath Ramesh and Navaljeet Singh Arora. 2019. Spark’s graphx-based link prediction for social communication using triangle counting. Social Network Analysis and Mining 9, 1 (2019), 28:1–28:12.
[24]
Mihaela E. Sardiu, Joshua M. Gilmore, Brad D. Groppe, Arnob Dutta, Laurence Florens, and Michael P. Washburn. 2019. Topological scoring of protein interaction networks. Nature Communications 10, 1 (2019), 1–14.
[25]
John Scott. 1988. Social network analysis. Sociology 22, 1 (1988), 109–127.
[26]
Kijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos. 2018. Tri-fly: Distributed estimation of global and local triangle counts in graph streams. In Proceedings of the 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’18). Springer, 651–663.
[27]
Kijung Shin, Euiwoong Lee, Jinoh Oh, Mohammad Hammoud, and Christos Faloutsos. 2018. DiSLR: Distributed sampling with limited redundancy for triangle counting in graph streams. arXiv:1802.04249. Retrieved from https://arxiv.org/abs/1802.04249.
[28]
Kijung Shin, Euiwoong Lee, Jinoh Oh, Mohammad Hammoud, and Christos Faloutsos. 2021. Cocos: Fast and accurate distributed triangle counting in graph streams. ACM Transactions on Knowledge Discovery from Data 15, 3 (2021), 38:1–38:30.
[29]
Kijung Shin, Sejoon Oh, Jisu Kim, Bryan Hooi, and Christos Faloutsos. 2020. Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Transactions on Knowledge Discovery from Data 14, 2 (2020), 12:1–12:39.
[30]
Paramvir Singh, Venkatesh Srinivasan, and Alex Thomo. 2021. Fast and scalable triangle counting in graph streams: The hybrid approach. In Proceedings of the 35th International Conference on Advanced Information Networking and Applications (AINA‘21), Vol. 226. 107–119.
[31]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2016. TRIÈST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, 825–834.
[32]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2017. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Transactions on Knowledge Discovery from Data 11, 4 (2017), 1–50.
[33]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 607–614.
[34]
Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, and Christos Faloutsos. 2011. Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Social Network Analysis and Mining 1, 2 (2011), 75–81.
[35]
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’09). ACM, 837–846.
[36]
Jeffrey Scott Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software 11, 1 (1985), 37–57.
[37]
Pinghui Wang, Peng Jia, Yiyan Qi, Yu Sun, Jing Tao, and Xiaohong Guan. 2019. REPT: A streaming algorithm of approximating global and local triangle counts in parallel. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’19). 758–769.
[38]
Howard T. Welser, Eric Gleave, Danyel Fisher, and Marc A. Smith. 2007. Visualizing the signatures of social roles in online discussion groups. Journal of Social Structure 8, 2 (2007), 1–32.
[39]
Bin Wu, Ke Yi, and Zhenguo Li. 2016. Counting triangles in large graphs by random sampling. IEEE Transactions on Knowledge and Data Engineering 28, 8 (2016), 2013–2026.
[40]
Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In Proceedings of the IEEE International Conference on Data Engineering. 901–912.
[41]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering social network sybils in the wild. ACM Transactions on Knowledge Discovery from Data 8, 1 (2014), 2:1–2:29.
[42]
Mengdi Yu, Chao Song, Jiqing Gu, and Ming Liu. 2019. Distributed triangle counting algorithms in simple graph stream. In Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Systems (ICPADS)’19. 294–301.

Cited By

View all
  • (2024)Compact Estimator for Streaming Triangle CountingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337122836:8(3712-3724)Online publication date: 1-Aug-2024
  • (2024)DTC: Real-Time and Accurate Distributed Triangle Counting in Fully Dynamic Graph Streams2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00028(198-209)Online publication date: 30-Sep-2024
  • (2023)A distributed streaming framework for edge–cloud triangle counting in graph streamsKnowledge-Based Systems10.1016/j.knosys.2023.110878278:COnline publication date: 25-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 4
August 2022
529 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3505210
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2022
Accepted: 01 October 2021
Revised: 01 September 2021
Received: 01 May 2021
Published in TKDD Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Global and local triangle counting
  2. distributed streaming algorithm
  3. graph stream

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Sichuan Science and Technology Program
  • The Science and Technology Achievements Transformation Demonstration Project of Sichuan Province of China
  • The Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Compact Estimator for Streaming Triangle CountingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337122836:8(3712-3724)Online publication date: 1-Aug-2024
  • (2024)DTC: Real-Time and Accurate Distributed Triangle Counting in Fully Dynamic Graph Streams2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00028(198-209)Online publication date: 30-Sep-2024
  • (2023)A distributed streaming framework for edge–cloud triangle counting in graph streamsKnowledge-Based Systems10.1016/j.knosys.2023.110878278:COnline publication date: 25-Oct-2023
  • (2023)Global triangle estimation based on first edge sampling in large graph streamsThe Journal of Supercomputing10.1007/s11227-023-05205-379:13(14079-14116)Online publication date: 3-Apr-2023
  • (2022)Efficient Retrieval of Top-k Weighted Triangles on Static and Dynamic Spatial DataIEEE Access10.1109/ACCESS.2022.317762010(55298-55307)Online publication date: 2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media