Abstract
The enumeration of hop-constrained simple paths is a building block in many graph-based areas. Due to the enormous search spaces in large-scale graphs, a single machine can hardly satisfy the requirements of both efficiency and memory, which causes an urgent need for efficient distributed methods. In practice, it is inevitable to produce plenty of intermediate results when directly extending centralized methods to the distributed environment, thereby causing a memory crisis and weakening the query performance. The state-of-the-art distributed method HybridEnum designed a hybrid search paradigm to enumerate simple paths. However, it makes massive exploration for the redundant vertices not located in any simple path, thereby resulting in poor query performance. To alleviate this problem, we design a distributed approach DistriEnum to optimize query performance and scalability with well-bound memory consumption. Firstly, DistriEnum adopts a graph reduction strategy to rule out the redundant vertices without satisfying the constraint of hop number. Then, a core search paradigm is designed to simultaneously reduce the traversal of shared subpaths and the storage of intermediate results. Moreover, DistriEnum is equipped with a task division strategy to theoretically achieve workload balance. Finally, a vertex migration strategy is devised to reduce the communication cost during the enumeration. The comprehensive experimental results on 10 real-world graphs demonstrate that DistriEnum achieves up to 3 orders of magnitude speedup than HybridEnum in query performance and exhibits superior performances on scalability, communication cost, and memory consumption.
- Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD. ACM, 349--360.Google Scholar
- Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert E. Tarjan. 2016. A New Approach to Incremental Cycle Detection and Related Problems. ACM Trans. Algorithms, Vol. 12, 2 (2016), 14:1--14:22.Google ScholarDigital Library
- Sayan Bhattacharya and Janardhan Kulkarni. 2020. An Improved Algorithm for Incremental Cycle Detection and Topological Ordering in Sparse Graphs. In SODA. SIAM, 2509--2521.Google Scholar
- Etienne Birmelé, Rui A. Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, Romeo Rizzi, and Gustavo Sacomoto. 2013. Optimal Listing of Cycles and st-Paths in Undirected Graphs. In SODA. SIAM, 1884--1896.Google Scholar
- Katerina Böhmová, Luca Hafliger, Matús Mihalák, Tobias Prö ger, Gustavo Sacomoto, and Marie-France Sagot. 2018. Computing and Listing st-Paths in Public Transportation Networks. Theory Comput. Syst., Vol. 62, 3 (2018), 600--621.Google ScholarDigital Library
- Yuzheng Cai, Siyuan Liu, Weiguo Zheng, and Xuemin Lin. 2023. Towards Generating Hop-constrained s-t Simple Path Graphs. CoRR, Vol. abs/2304.12656 (2023). https://doi.org/10.48550/arXiv.2304.12656Google ScholarCross Ref
- Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Jian Pei. 2015. Efficiently Computing Top-K Shortest Path Join. In EDBT. OpenProceedings.org, 133--144.Google Scholar
- Xiaoshuang Chen, Kai Wang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Ying Zhang. 2021. Efficiently Answering Reachability and Path Queries on Temporal Bipartite Graphs. Proc. VLDB Endow., Vol. 14, 10 (2021), 1845--1858.Google ScholarDigital Library
- Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., Vol. 32, 5 (2003), 1338--1355.Google ScholarDigital Library
- Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective Community Search for Large Attributed Graphs. Proc. VLDB Endow., Vol. 9, 12 (2016), 1233--1244.Google ScholarDigital Library
- Roberto Grossi, Andrea Marino, and Luca Versari. 2018. Efficient Algorithms for Listing k Disjoint st-Paths in Graphs. In LATIN (Lecture Notes in Computer Science, Vol. 10807). Springer, 544--557.Google ScholarCross Ref
- Sairam Gurajada and Martin Theobald. 2016. Distributed Set Reachability. In SIGMOD. ACM, 1247--1261.Google Scholar
- Bernhard Haeupler, Telikepalli Kavitha, Rogers Mathew, Siddhartha Sen, and Robert Endre Tarjan. 2012. Incremental Cycle Detection, Topological Ordering, and Strong Component Maintenance. ACM Trans. Algorithms, Vol. 8, 1 (2012), 3:1--3:33.Google ScholarDigital Library
- Kongzhang Hao, Long Yuan, and Wenjie Zhang. 2021. Distributed Hop-Constrained s-t Simple Path Enumeration at Billion Scale. Proc. VLDB Endow., Vol. 15, 2 (2021), 169--182.Google ScholarDigital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In PODS, 2016. ACM, 13--28.Google ScholarDigital Library
- Larkshmi Krishnamurthy, Joseph H. Nadeau, Gultekin Özsoyoglu, Z. Meral Ö zsoyoglu, Greg Schaeffer, Murat Tasan, and Wanhong Xu. 2003. Pathways Database System: An Integrated System for Biological Pathways. Bioinform., Vol. 19, 8 (2003), 930--937.Google Scholar
- Rohit Kumar and Toon Calders. 2018. 2SCENT: An Efficient Algorithm to Enumerate All Simple Temporal Cycles. Proc. VLDB Endow., Vol. 11, 11 (2018), 1441--1453.Google ScholarDigital Library
- Kisung Lee and Ling Liu. 2013. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. Proc. VLDB Endow., Vol. 6, 14 (2013), 1894--1905.Google ScholarDigital Library
- Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling Distance Labeling on Small-World Networks. In SIGMOD. ACM, 1060--1077.Google Scholar
- Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2020. Scaling Up Distance Labeling on Graphs with Core-Periphery Properties. In SIGMOD. ACM, 1367--1381.Google Scholar
- Masaaki Nishino, Norihito Yasuda, Shin-ichi Minato, and Masaaki Nagata. 2017. Compiling Graph Substructures into Sentential Decision Diagrams. In AAAI. 1213--1221.Google Scholar
- You Peng, Xuemin Lin, Ying Zhang, Wenjie Zhang, and Lu Qin. 2022. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J., Vol. 31, 1 (2022), 101--127.Google ScholarDigital Library
- You Peng, Ying Zhang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Jingren Zhou. 2019. Hop-constrained s-t Simple Path Enumeration: Towards Bridging Theory and Practice. Proc. VLDB Endow., Vol. 13, 4 (2019), 463--476.Google ScholarDigital Library
- Michalis Potamias, Francesco Bonchi, Carlos Castillo, and Aristides Gionis. 2009. Fast shortest path distance estimation in large networks. In CIKM. ACM, 867--876.Google Scholar
- Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proc. VLDB Endow., Vol. 11, 12 (2018), 1876--1888.Google ScholarDigital Library
- Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowl. Based Syst., Vol. 104 (2016), 123--133.Google ScholarDigital Library
- Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Checking. In ICDM. IEEE Computer Society, 859--864.Google Scholar
- Shixuan Sun, Yuhang Chen, Bingsheng He, and Bryan Hooi. 2021. PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration. In SIGMOD. ACM, 1758--1770.Google Scholar
- Lucien D. J. Valstar, George H. L. Fletcher, and Yuichi Yoshida. 2017. Landmark Indexing for Evaluation of Label-Constrained Reachability Queries. In SIGMOD. ACM, 345--358.Google Scholar
- Dong Wen, Yilun Huang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2020. Efficiently Answering Span-Reachability Queries in Large Temporal Graphs. In ICDE. IEEE, 1153--1164.Google Scholar
- Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs. Proc. VLDB Endow., Vol. 7, 14 (2014), 1981--1992.Google ScholarDigital Library
- Norihito Yasuda, Teruji Sugaya, and Shin-ichi Minato. 2017. Fast Compilation of s-t Paths on a Graph for Counting and Enumeration. In AMBN (Proceedings of Machine Learning Research, Vol. 73). PMLR, 129--140.Google Scholar
- Yuanyuan Zeng, Kenli Li, Xu Zhou, Wensheng Luo, and Yunjun Gao. 2022a. An Efficient Index-Based Approach to Distributed Set Reachability on Small-World Graphs. IEEE Trans. Parallel Distributed Syst., Vol. 33, 10 (2022), 2358--2371.Google ScholarCross Ref
- Yuanyuan Zeng, Wangdong Yang, Xu Zhou, Guoqin Xiao, Yunjun Gao, and Kenli Li. 2022b. Distributed Set Label-Constrained Reachability Queries over Billion-Scale Graphs. In ICDE. IEEE.Google Scholar
- Chao Zhang, Angela Bonifati, and M. Tamer Özsu. 2023. An Overview of Reachability Indexes on Graphs. In SIGMOD, 2023. ACM, 61--68.Google ScholarDigital Library
Index Terms
- Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs
Recommendations
Hop-Constrained s-t Simple Path Enumeration in Billion-Scale Labelled Graphs
Web Information Systems Engineering – WISE 2022AbstractHop-constrained s-t simple path () enumeration is a fundamental problem in graph analysis. Existing solutions for this problem focus on unlabelled graphs and assume queries are issued without any label constraints. However, in many real-...
Hop-Constrained s-t Simple Path Enumeration in Large Uncertain Graphs
Databases Theory and ApplicationsAbstractUncertain graphs are graphs where each edge is assigned with a probability of existence. In this paper, we study the problem of hop-constrained s-t simple path enumeration in large uncertain graphs. To the best of our knowledge, we are the first ...
Balanced Hop-Constrained Path Enumeration in Signed Directed Graphs
Databases Theory and ApplicationsAbstractHop-constrained path enumeration, which aims to output all the paths from two distinct vertices within the given hops, is one of the fundamental tasks in graph analysis. Previous works about this problem mainly focus on unsigned graphs. ...
Comments