Efficient MapReduce algorithms for triangle listing in billion-scale graphs

Zhu, Yuanyuan; Zhang, Hao; Qin, Lu; Cheng, Hong

doi:10.1007/s10619-017-7193-1

Efficient MapReduce algorithms for triangle listing in billion-scale graphs

Published: 17 March 2017

Volume 35, pages 149–176, (2017)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Yuanyuan Zhu¹,
Hao Zhang¹,
Lu Qin² &
…
Hong Cheng³

536 Accesses
7 Citations
Explore all metrics

Abstract

This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an urgent challenge in distributed environment where multiple machines across the network can be utilized to achieve good performance and scalability. As one of the de facto computing methodologies in distributed environment, MapReduce has been used in some of existing triangle listing algorithms. However, these algorithms usually need to shuffle a huge amount of intermediate data, which seriously hinders their scalability on large scale graphs. In this paper, we propose a new triangle listing algorithm in MapReduce, FTL, which utilizes a light weight data structure to substantially reduce the intermediate data transferred during the shuffle stage, and also is equipped with multiple-round techniques to ease the burden on memory and network bandwidth when dealing with graphs at billion scale. We prove that the size of the intermediate data can be well bounded near to the number of triangles in the graph. To further reduce the shuffle size and memory cost, we also propose improved algorithms based on a compact data structure, and present several optimization techniques to accelerate the computation and reduce the memory consumption. The extensive experimental results show that our algorithms outperform existing competitors by several times on both synthetic graphs and real world graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs

Article Open access 04 January 2023

Ahmed Sharafeldeen, Mohammed Alrahmawy & Samir Elmougy

Iterative Computation of Connected Graph Components with MapReduce

Article 03 June 2014

Lars Kolb, Ziad Sehili & Erhard Rahm

Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration

References

Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)
Article Google Scholar
Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)
Article Google Scholar
Schank, T.: Algorithmic aspects of triangle-based network analysis. PhD in Computer Science, University Karlsruhe, vol 1 (2007)
Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978)
Article MathSciNet MATH Google Scholar
Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)
Article MathSciNet MATH Google Scholar
Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)
Article Google Scholar
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)
Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1), 458–473 (2008)
Article MathSciNet MATH Google Scholar
Eppstein, D., Spiro, E.S.: The h-index of a graph and its application to dynamic subgraph statistics. In: Algorithms and Data Structures, pp. 278–289. Springer, Heidelberg (2009)
Menegola, B.: An External Memory Algorithm for Listing Triangles. Technical report. Universidade Federal do Rio Grande do Sul (2010)
Dementiev, R.: Algorithm engineering for large data sets. PhD Dissertation, Saarland University (2006)
Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680. ACM (2011)
Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: Proceedings of SIGMOD, pp. 325–336. ACM, New York (2013)
Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)
Article Google Scholar
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of WWW, pp. 607–614. ACM, New York (2011)
Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: MapReduce triangle enumeration with guarantees. In: Proceedings of CIKM, pp. 1739–1748. ACM (2014)
Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: Proceedings of CIKM, pp. 539–548. ACM (2013)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, pp. 17–30 (2012)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146. ACM, New York (2010)
Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X.: Efficient triangle listing for billion-scale graphs. In: IEEE BigData, pp. 813–822. IEEE (2016)
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (June 2014). Accessed 8 Mar 2016
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW, pp. 591–600. ACM, New York (2010)
http://lemurproject.org/clueweb09/index.php. Accessed 10 Mar 2016
Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)
Article Google Scholar
Cao, P.: Bloom filter introduction. http://pages.cs.wisc.edu/cao/papers/summary-cache/node8.html. Accessed 25 Mar 2016
Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)
Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)
Khorasani, F., Vora, K., Gupta, R.: PaRMAT: a parallel generator for large R-MAT graphs (2015). https://github.com/farkhor/PaRMAT. Accessed 20 May 2016
Khorasani, F., Gupta, R., Bhuyan, L.N.: Scalable SIMD-efficient graph processing on GPUs. In: Proceedings of PACT, Series PACT ’15, pp. 39–50 (2015)
Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: OPT: a new framework for overlapped and parallel triangulation in large-scale graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 637–648. ACM (2014)
Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124. ACM (2016)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014)

Download references

Acknowledgements

This work was partially supported by the Grants from the National Science Foundation of China (61502349), Hubei Provincial Natural Science Foundation of China (2015CFB339), the Scientific and Technologic Development Programme of SuZhou (SYG201442), Research Grants Council of the Hong Kong (14209314 and 14221716), Chinese University of Hong Kong Direct Grant (4055048) and Australian Research Council (DE140100999 and DP160101513).

Author information

Authors and Affiliations

State Key Lab of Software Engineering, School of Computer, Wuhan University, Wuhan, China
Yuanyuan Zhu & Hao Zhang
Centre for Quantum Computation and Intelligent Systems, University of Technology Sydney, Sydney, Australia
Lu Qin
The Chinese University of Hong Kong, Hong Kong, China
Hong Cheng

Authors

Yuanyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanyuan Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Zhang, H., Qin, L. et al. Efficient MapReduce algorithms for triangle listing in billion-scale graphs. Distrib Parallel Databases 35, 149–176 (2017). https://doi.org/10.1007/s10619-017-7193-1

Download citation

Published: 17 March 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10619-017-7193-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient MapReduce algorithms for triangle listing in billion-scale graphs

Abstract

Access this article

Similar content being viewed by others

Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs

Iterative Computation of Connected Graph Components with MapReduce

Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs

Iterative Computation of Connected Graph Components with MapReduce

Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation