Abstract
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposed for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real-world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this article, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches.
Supplemental Material
Available for Download
Online appendix to path-tree an efficient reachability indexing scheme for large directed graphs on article 7.
- Adler, M. and Mitzenmacher, M. 2001. Towards compressing web graphs. In Proceedings of the Data Compression Conference. IEEE, 203--212. Google ScholarDigital Library
- Agrawal, R., Borgida, A., and Jagadish, H. 1989. Efficient management of transitive relationships in large data and knowledge bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 253--262. Google ScholarDigital Library
- Bouros, P., Skiadopoulos, S., Dalamagas, T., Sacharidis, D., and Sellis, T. K. 2009. Evaluating reachability queries over path collections. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM). 398--416. Google ScholarDigital Library
- Chen, L., Gupta, A., and Kurul, M. 2005. Stack-based algorithms for pattern matching on dags. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 493--504. Google ScholarDigital Library
- Cheng, J., Yu, J. X., Lin, X., Wang, H., and Yu, P. S. 2006. Fast computation of reachability labeling for large graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT). 961--979. Google ScholarDigital Library
- Chu, Y. J. and Liu, T. H. 1965. On the shortest arborescence of a directed graph. Sci. Sinica 14, 1396--1400.Google Scholar
- Cohen, E., Halperin, E., Kaplan, H., and Zwick, U. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5, 1338--1355. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms. MIT Press. Google ScholarDigital Library
- de Berg, M., Cheong, O., van Kreveld, M., and Overmars, M. 2008. Computational Geometry: Algorithms and Applications 3rd Ed. Springer-Verlag. Google ScholarDigital Library
- Dilworth, R. P. 1950. A decomposition theorem for partially ordered sets. Ann. Math., 2nd Series 51, 1, 161--166.Google ScholarCross Ref
- Edmonds, J. 1967. Optimum branchings. J. Res. Natl. Bureau Stand. 71B, 233--240.Google ScholarCross Ref
- Gabow, H. N., Galil, Z., Spencer, T., and Tarjan, R. E. 1986. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica 6, 2, 109--122. Google ScholarDigital Library
- Goldberg, A. V., Tardos, E., and Tarjan, R. E. 1990. Network Flow Algorithms. Springer Verlag, 101--164.Google Scholar
- Jagadish, H. V. 1990. A compression technique to materialize transitive closure. ACM Trans. Datab. Syst. 15, 4, 558--598. Google ScholarDigital Library
- Jin, R., Hong, H., Wang, H., Ruan, N., and Xiang, Y. 2010. Computing label-constraint reachability in graph databases. In Proceedings of the SIGMOD Conference. 123--134. Google ScholarDigital Library
- Jin, R., Xiang, Y., Ruan, N., and Fuhry, D. 2009. 3-hop: a high-compression indexing scheme for reachability query. In Proceedings of the SIGMOD Conference. 813--826. Google ScholarDigital Library
- Jin, R., Xiang, Y., Ruan, N., and Wang, H. 2008. Efficiently answering reachability queries on very large directed graphs. In Proceedings of the SIGMOD Conference. 595--608. Google ScholarDigital Library
- Kameda, T. 1975. On the vector representation of the reachability in planar directed graphs* 1. Inform. Process. Lett. 3, 3, 75--77.Google ScholarCross Ref
- König, J. 1884. Über eine eigenschaft der potenzreihen. Math. Ann. 23, 447--449.Google ScholarCross Ref
- Navlakha, S., Rastogi, R., and Shrivastava, N. 2008. Graph summarization with bounded error. In Proceedings of the SIGMOD Conference. 419--432. Google ScholarDigital Library
- Raghavan, S. and Garcia-Molina, H. 2003. Representing web graphs. In Proceedings of the International Conference on Data Engineering (ICDE). 405--416.Google Scholar
- Schenkel, R., Theobald, A., and Weikum, G. 2004. Hopi: An efficient connection index for complex xml document collections. In Proceedings of the International Conference on Extending Database Technology (EDBT). 237--255.Google Scholar
- Simon, K. 1988. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58, 1-3, 325--346. Google ScholarDigital Library
- Trissl, S. and Leser, U. 2007. Fast and practical indexing and querying of very large graphs. In Proceedings of the SIGMOD Conference. 845--856. Google ScholarDigital Library
- Wang, H., He, H., Yang, J., Yu, P. S., and Yu, J. X. 2006. Dual labeling: Answering graph reachability queries in constant time. In Proceedings of the International Conference on Data Engineering (ICDE). 75. Google ScholarDigital Library
- Yildirim, H., Chaoji, V., and Zaki, M. J. 2010. Grail: scalable reachability index for large graphs. Proc. VLDB Endow. 3, 276--284. Google ScholarDigital Library
Index Terms
- Path-tree: An efficient reachability indexing scheme for large directed graphs
Recommendations
3-HOP: a high-compression indexing scheme for reachability query
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataReachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the ...
Efficiently answering reachability queries on very large directed graphs
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataEfficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In ...
A memory efficient reachability data structure through bit vector compression
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataWhen answering many reachability queries on a large graph, the principal challenge is to represent the transitive closure of the graph compactly, while still allowing fast membership tests on that transitive closure. Recent attempts to address this ...
Comments