skip to main content
research-article

Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

The enumeration of hop-constrained simple paths is a building block in many graph-based areas. Due to the enormous search spaces in large-scale graphs, a single machine can hardly satisfy the requirements of both efficiency and memory, which causes an urgent need for efficient distributed methods. In practice, it is inevitable to produce plenty of intermediate results when directly extending centralized methods to the distributed environment, thereby causing a memory crisis and weakening the query performance. The state-of-the-art distributed method HybridEnum designed a hybrid search paradigm to enumerate simple paths. However, it makes massive exploration for the redundant vertices not located in any simple path, thereby resulting in poor query performance. To alleviate this problem, we design a distributed approach DistriEnum to optimize query performance and scalability with well-bound memory consumption. Firstly, DistriEnum adopts a graph reduction strategy to rule out the redundant vertices without satisfying the constraint of hop number. Then, a core search paradigm is designed to simultaneously reduce the traversal of shared subpaths and the storage of intermediate results. Moreover, DistriEnum is equipped with a task division strategy to theoretically achieve workload balance. Finally, a vertex migration strategy is devised to reduce the communication cost during the enumeration. The comprehensive experimental results on 10 real-world graphs demonstrate that DistriEnum achieves up to 3 orders of magnitude speedup than HybridEnum in query performance and exhibits superior performances on scalability, communication cost, and memory consumption.

References

  1. Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD. ACM, 349--360.Google ScholarGoogle Scholar
  2. Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert E. Tarjan. 2016. A New Approach to Incremental Cycle Detection and Related Problems. ACM Trans. Algorithms, Vol. 12, 2 (2016), 14:1--14:22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sayan Bhattacharya and Janardhan Kulkarni. 2020. An Improved Algorithm for Incremental Cycle Detection and Topological Ordering in Sparse Graphs. In SODA. SIAM, 2509--2521.Google ScholarGoogle Scholar
  4. Etienne Birmelé, Rui A. Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, Romeo Rizzi, and Gustavo Sacomoto. 2013. Optimal Listing of Cycles and st-Paths in Undirected Graphs. In SODA. SIAM, 1884--1896.Google ScholarGoogle Scholar
  5. Katerina Böhmová, Luca Hafliger, Matús Mihalák, Tobias Prö ger, Gustavo Sacomoto, and Marie-France Sagot. 2018. Computing and Listing st-Paths in Public Transportation Networks. Theory Comput. Syst., Vol. 62, 3 (2018), 600--621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yuzheng Cai, Siyuan Liu, Weiguo Zheng, and Xuemin Lin. 2023. Towards Generating Hop-constrained s-t Simple Path Graphs. CoRR, Vol. abs/2304.12656 (2023). https://doi.org/10.48550/arXiv.2304.12656Google ScholarGoogle ScholarCross RefCross Ref
  7. Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Jian Pei. 2015. Efficiently Computing Top-K Shortest Path Join. In EDBT. OpenProceedings.org, 133--144.Google ScholarGoogle Scholar
  8. Xiaoshuang Chen, Kai Wang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Ying Zhang. 2021. Efficiently Answering Reachability and Path Queries on Temporal Bipartite Graphs. Proc. VLDB Endow., Vol. 14, 10 (2021), 1845--1858.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., Vol. 32, 5 (2003), 1338--1355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective Community Search for Large Attributed Graphs. Proc. VLDB Endow., Vol. 9, 12 (2016), 1233--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Roberto Grossi, Andrea Marino, and Luca Versari. 2018. Efficient Algorithms for Listing k Disjoint st-Paths in Graphs. In LATIN (Lecture Notes in Computer Science, Vol. 10807). Springer, 544--557.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sairam Gurajada and Martin Theobald. 2016. Distributed Set Reachability. In SIGMOD. ACM, 1247--1261.Google ScholarGoogle Scholar
  13. Bernhard Haeupler, Telikepalli Kavitha, Rogers Mathew, Siddhartha Sen, and Robert Endre Tarjan. 2012. Incremental Cycle Detection, Topological Ordering, and Strong Component Maintenance. ACM Trans. Algorithms, Vol. 8, 1 (2012), 3:1--3:33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kongzhang Hao, Long Yuan, and Wenjie Zhang. 2021. Distributed Hop-Constrained s-t Simple Path Enumeration at Billion Scale. Proc. VLDB Endow., Vol. 15, 2 (2021), 169--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In PODS, 2016. ACM, 13--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Larkshmi Krishnamurthy, Joseph H. Nadeau, Gultekin Özsoyoglu, Z. Meral Ö zsoyoglu, Greg Schaeffer, Murat Tasan, and Wanhong Xu. 2003. Pathways Database System: An Integrated System for Biological Pathways. Bioinform., Vol. 19, 8 (2003), 930--937.Google ScholarGoogle Scholar
  17. Rohit Kumar and Toon Calders. 2018. 2SCENT: An Efficient Algorithm to Enumerate All Simple Temporal Cycles. Proc. VLDB Endow., Vol. 11, 11 (2018), 1441--1453.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kisung Lee and Ling Liu. 2013. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. Proc. VLDB Endow., Vol. 6, 14 (2013), 1894--1905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling Distance Labeling on Small-World Networks. In SIGMOD. ACM, 1060--1077.Google ScholarGoogle Scholar
  20. Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2020. Scaling Up Distance Labeling on Graphs with Core-Periphery Properties. In SIGMOD. ACM, 1367--1381.Google ScholarGoogle Scholar
  21. Masaaki Nishino, Norihito Yasuda, Shin-ichi Minato, and Masaaki Nagata. 2017. Compiling Graph Substructures into Sentential Decision Diagrams. In AAAI. 1213--1221.Google ScholarGoogle Scholar
  22. You Peng, Xuemin Lin, Ying Zhang, Wenjie Zhang, and Lu Qin. 2022. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J., Vol. 31, 1 (2022), 101--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. You Peng, Ying Zhang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Jingren Zhou. 2019. Hop-constrained s-t Simple Path Enumeration: Towards Bridging Theory and Practice. Proc. VLDB Endow., Vol. 13, 4 (2019), 463--476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michalis Potamias, Francesco Bonchi, Carlos Castillo, and Aristides Gionis. 2009. Fast shortest path distance estimation in large networks. In CIKM. ACM, 867--876.Google ScholarGoogle Scholar
  25. Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proc. VLDB Endow., Vol. 11, 12 (2018), 1876--1888.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowl. Based Syst., Vol. 104 (2016), 123--133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Checking. In ICDM. IEEE Computer Society, 859--864.Google ScholarGoogle Scholar
  28. Shixuan Sun, Yuhang Chen, Bingsheng He, and Bryan Hooi. 2021. PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration. In SIGMOD. ACM, 1758--1770.Google ScholarGoogle Scholar
  29. Lucien D. J. Valstar, George H. L. Fletcher, and Yuichi Yoshida. 2017. Landmark Indexing for Evaluation of Label-Constrained Reachability Queries. In SIGMOD. ACM, 345--358.Google ScholarGoogle Scholar
  30. Dong Wen, Yilun Huang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2020. Efficiently Answering Span-Reachability Queries in Large Temporal Graphs. In ICDE. IEEE, 1153--1164.Google ScholarGoogle Scholar
  31. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs. Proc. VLDB Endow., Vol. 7, 14 (2014), 1981--1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Norihito Yasuda, Teruji Sugaya, and Shin-ichi Minato. 2017. Fast Compilation of s-t Paths on a Graph for Counting and Enumeration. In AMBN (Proceedings of Machine Learning Research, Vol. 73). PMLR, 129--140.Google ScholarGoogle Scholar
  33. Yuanyuan Zeng, Kenli Li, Xu Zhou, Wensheng Luo, and Yunjun Gao. 2022a. An Efficient Index-Based Approach to Distributed Set Reachability on Small-World Graphs. IEEE Trans. Parallel Distributed Syst., Vol. 33, 10 (2022), 2358--2371.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yuanyuan Zeng, Wangdong Yang, Xu Zhou, Guoqin Xiao, Yunjun Gao, and Kenli Li. 2022b. Distributed Set Label-Constrained Reachability Queries over Billion-Scale Graphs. In ICDE. IEEE.Google ScholarGoogle Scholar
  35. Chao Zhang, Angela Bonifati, and M. Tamer Özsu. 2023. An Overview of Reachability Indexes on Graphs. In SIGMOD, 2023. ACM, 61--68.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 2, Issue 1
      PACMMOD
      February 2024
      1874 pages
      EISSN:2836-6573
      DOI:10.1145/3654807
      Issue’s Table of Contents

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 March 2024
      Published in pacmmod Volume 2, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)53
      • Downloads (Last 6 weeks)30

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader