ABSTRACT
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In SIGMOD, pages 253--262, 1989. Google ScholarDigital Library
- Li Chen, Amarnath Gupta, and M. Erdem Kurul. Stack-based algorithms for pattern matching on dags. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 493--504, 2005. Google ScholarDigital Library
- Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, and Philip S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961--979, 2006. Google ScholarDigital Library
- Y. J. Chu and T. H. Liu. On the shortest arborescence of a directed graph. Science Sinica, 14:1396--1400, 1965.Google Scholar
- Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. Reachability and distance queries via 2-hop labels. In Proceedings of the 13th annual ACM-SIAM Symposium on Discrete algorithms, pages 937--946, 2002. Google ScholarDigital Library
- Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. McGraw Hill, 1990. Google ScholarDigital Library
- Mark de Berg, M. van Krefeld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, second edition, 2000. Google ScholarDigital Library
- J. Edmonds. Optimum branchings. J. Research of the National Bureau of Standards, 71B:233--240, 1967.Google ScholarCross Ref
- H N Gabow, Z Galil, T Spencer, and R E Tarjan. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, 6(2):109--122, 1986. Google ScholarDigital Library
- A. V. Goldberg, E. Tardos, and R. E. Tarjan. Network Flow Algorithms, pages 101--164. Springer Verlag, 1990.Google Scholar
- H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558--598, 1990. Google ScholarDigital Library
- T. Kameda. On the vector representation of the reachability in planar directed graphs. Information Processing Letters, 3(3), January 1975.Google ScholarCross Ref
- R. Schenkel, A. Theobald, and G. Weikum. HOPI: An efficient connection index for complex XML document collections. In EDBT, 2004.Google ScholarCross Ref
- K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci., 58(1-3):325--346, 1988. Google ScholarDigital Library
- Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 845--856, 2007. Google ScholarDigital Library
- Haixun Wang, Hao He, Jun Yang, Philip S. Yu, and Jeffrey Xu Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), page 75, 2006. Google ScholarDigital Library
Index Terms
- Efficiently answering reachability queries on very large directed graphs
Recommendations
Computing label-constraint reachability in graph databases
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataOur world today is generating huge amounts of graph data such as social networks, biological networks, and the semantic web. Many of these real-world graphs are edge-labeled graphs, i.e., each edge has a label that denotes the relationship between the ...
3-HOP: a high-compression indexing scheme for reachability query
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataReachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the ...
Path-tree: An efficient reachability indexing scheme for large directed graphs
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling ...
Comments