skip to main content
10.1145/3409501.3409507acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

HOSA: Fast Distributed Triangle Enumerating On Tera-Edge Graphs

Published: 25 August 2020 Publication History

Abstract

Distributed triangle enumerating has attracted a lot of interest for the capability to process large graphs quickly. However, existing solutions usually suffer from low speed and/or poor graph-size scalability due to the massive short messages passed via the network or the huge amount of intermediate data. To address the issues, we propose HOSA, a distributed out-of-core triangle enumerating algorithm. HOSA first introduces an I/O-efficient graph placement strategy to divide and place a graph across a cluster in a load-balanced manner to reduce intermediate data and exploit the aggregated I/O bandwidth of the cluster. To enumerate triangles, HOSA proposes an efficient algorithm which not only eliminates short messages, the main cause of the low speed of most previous works, but also effectively uses the aggregated network and I/O bandwidths to further improve performance, and achieves high speed and graph-size scalability. Extensive evaluations show that HOSA can process much larger graphs with better or at least comparable speed than state-of-the-art algorithms, including the best distributed-memory algorithm LiteTE.

References

[1]
Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In Proc. of SIGKDD.
[2]
Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. 2014. Parallel algorithms for counting triangles in networks with large degrees. arXiv (2014).
[3]
Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe. 2015. A space-efficient parallel algorithm for counting exact triangles in massive networks. In Proc. of HPCC.
[4]
Ariful Azad, Aydin Buluç, and John Gilbert. 2015. Parallel triangle counting and enumeration using matrix algebra. In Proc. of IPDPS.
[5]
Ziv Bar-Yossef, Ravi Kumar, and D Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. of SODA.
[6]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proc. of SIGKDD.
[7]
Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In Proc. of WWW.
[8]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proc. of SODA.
[9]
Yi Cui, Di Xiao, Daren BH Cline, and Dmitri Loguinov. 2017. Improving I/O complexity of triangle enumeration. In Proc. of ICDM.
[10]
Yi Cui, Di Xiao, and Dmitri Loguinov. 2016. On efficient external-memory triangle listing. In Proc. of ICDM.
[11]
Laxman Dhulipala, Guy E Blelloch, and Julian Shun. 2018. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable. Proc. of SPAA (2018).
[12]
Ulrich Drepper. 2007. What every programmer should know about memory. Red Hat, Inc (2007).
[13]
DavidEdiger,JasonRiedy,DavidABader,andHenningMeyerhenke.2011. Tracking structure of streaming social networks. In Proc. of IPDPS.
[14]
Ilias Giechaskiel, George Panagopoulos, and Eiko Yoneki. 2015. PDTL: Parallel and distributed triangle listing for massive graphs. In Proc. of ICPP.
[15]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proc. of OSDI.
[16]
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proc. of SIGMOD.
[17]
Yang Hu, Hang Liu, and H Howie Huang. 2018. Tricore: Parallel triangle counting on gpus. In Proc. of SC.
[18]
Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. 2014. OPT: a new framework for overlapped and parallel triangulation in large-scale graphs. In Proc. of SIGMOD.
[19]
Seongyun Ko and Wook-Shin Han. 2018. Turbograph++: A scalable and fast graph analytics system. In Proc. of SIGMOD.
[20]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In Proc. of OSDI.
[21]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proc. of SIGMOD.
[22]
Mark EJ Newman. 2003. The structure and function of complex networks. SIAM review (2003).
[23]
Peter Pacheco. 2011. An introduction to parallel programming.
[24]
Himchan Park and Min Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model. In Proc. SIGMOD.
[25]
Ha-Myung Park, Sung-Hyon Myaeng, and U Kang. 2016. PTE: enumerating trillion triangles on distributed systems. In Proc. of SIGKDD.
[26]
David A Patterson. 2004. Latency lags bandwidth. Commun. ACM (2004).
[27]
Roger Pearce. 2017. Triangle counting for scale-free graphs at scale in distributed memory. In Proc. of HPEC.
[28]
Aleksandar Prokopec, Dmitry Petrashko, and Martin Odersky. 2015. Efficient lock-free work-stealing iterators for data-parallel collections. In 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[29]
Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management.
[30]
Julian Shun and Kanat Tangwongsan. 2015. Multicore triangle computations without tuning. In Proc. of ICDE.
[31]
Marc Snir, Steve Otto, Steven Huss-Lederman, Jack Dongarra, and David Walker. 1998. MPI the Complete Reference: The MPI core. MIT press.
[32]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proc. of WWW.
[33]
Ancy Sarah Tom, Narayanan Sundaram, Nesreen K Ahmed, Shaden Smith, Stijn Eyerman, Midhunchandra Kodiyath, Ibrahim Hur, Fabrizio Petrini, and George Karypis. 2017. Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms. In Proc. of HPEC.
[34]
Wenan Wang, Yu Gu, Zhigang Wang, and Ge Yu. 2013. Parallel triangle counting over large graphs. In Proc. of DASFAA.
[35]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proc. of SIGMOD.
[36]
Tom White. 2015. Hadoop: The definitive guide.
[37]
Wikipedia contributors. 2019. Compare-and-swap. https://en.wikipedia.org/wiki/Compare-and-swap [Online; accessed 21-July-2019].
[38]
Wikipedia contributors. 2019. Terabit Ethernet. https://en.wikipedia.org/wiki/Terabit_Ethernet [Online; accessed 21-July-2019].
[39]
Anthony Williams. 2012. C++concurrencyin action:practicalmultithreading.
[40]
Yongxuan Zhang, Hong Jiang, Fang Wang, Yu Hua, Dan Feng, and Xianghao Xu. 2019. LiteTE: Lightweight, Communication-Efficient Distributed-Memory Triangle Enumerating. IEEE Access (2019).
[41]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. of ATC.

Index Terms

  1. HOSA: Fast Distributed Triangle Enumerating On Tera-Edge Graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence
    July 2020
    276 pages
    ISBN:9781450375603
    DOI:10.1145/3409501
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed computing
    2. graph processing
    3. out-of-core processing
    4. parallel processing
    5. triangle computation
    6. triangle enumerating

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HPCCT & BDAI 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 61
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media