Abstract
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We present a graph query language (GraphQL) that supports bulk operations on graphs with arbitrary structures and annotated at- tributes. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs at a time. The core of GraphQL is a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and pro- files, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that graph specific optimizations outperform an SQL-based implementation by orders of magnitude.
This is a revised and extended version of the article “Graphs-at-a-time: Query Language and Access Methods for Graph Databases”, Huahai He and Ambuj K. Singh, In Proceedings of the 2008 ACMSIGMOD Conference, http://doi.acm.org/10.1145/1376616.1376660. Reprinted with permission of ACM.
Work done while at the University of California, Santa Barbara.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient xml query pattern matching. In ICDE, pages 141–, 2002.
S. Asthana et al. Predicting protein complex membership using probabilistic network reliability. Genome Research, May 2004.
S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. In IEEE Trans. on Pattern Analysis and Machine Intelligence, volume 23, 2001.
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML query language. W3C, http://www.w3.org/TR/xquery/,2007.
C. Branden and J. Tooze. Introduction to protein structure. Garland, 2 edition, 1998.
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD Conference, pages 310–321, 2002.
S. Chaudhuri. An overview of query optimization in relational systems. In PODS, pages 34–43, 1998.
L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In Proc. of VLDB ’05, pages 493–504, 2005.
J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: towards verification-free query processing on graph databases. In Proc. of SIGMOD ’07, 2007.
] J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961–979, 2006.
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM J. Comput., 32(5):1338–1355, 2003.
M. P. Consens and A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In PODS, 1990.
P. Erdos and A. Renyi. On random graphs I. Publ. Math. Debrecen, (6):290–297, 1959.
Gene Ontology. http://www.geneontology.org/.
R. H. Guting. GraphDB: Modeling and querying graphs in databases. In Proc. of VLDB’94, pages 297–308, 1994.
M. Gyssens, J. Paredaens, and D. van Gucht. A graph-oriented object database model. In Proc. of PODS ’90, pages 417–424, 1990.
H. He and A. K. Singh. Closure-Tree: An Index Structure for Graph Queries. In Proc. of ICDE ’06, Atlanta, USA, 2006.
H. He and A. K. Singh. Graphs-at-a-time: Query Language and Access Methods for Graph Databases. In Proc. of SIGMOD ’08, pages 405–418, Vancouver, Canada, 2008.
J. Hopcroft and R. Karp. An n 5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Computing, 1973.
J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 1979.
H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu. TIMBER: A native XML database. VLDB J., 11(4):274–291, 2002.
H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proc. of DBPL ’01, 2001.
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. GString: A novel approach for efficient search in graph databases. In ICDE, 2007.
J. Lee, J. Oh, and S. Hwang. STRG-Index: Spatio-temporal region graph indexing for large video databases. In Proc. of SIGMOD, 2005.
U. Leser. A query language for biological networks. Bioinformatics, 21:ii33–ii39, 2005.
F. Manola and E. Miller. RDF Primer. W3C, http://www.w3.org/TR/rdf-primer/,2004.
E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF. W3C, http://www.w3.org/TR/rdf-sparql-query/,2007.
R. Ramakrishnan and J. Gehrke. Database Management Systems, chapter 24 Deductive Databases. McGraw-Hill, third edition, 2003.
J. Rekers and A. Schurr. A graph grammar approach to graphical parsing. In 11th International IEEE Symposium on Visual Languages, 1995.
G. Rozenberg (Ed.). Handbook on Graph Grammars and Computing by Graph Transformation: Foundations, volume 1. World Scientific, 1997.
R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In Proc. of ICDE ’05, pages 360–371, 2005.
N. Shadbolt, T. Berners-Lee, and W. Hall. The semantic web revisited. IEEE Intelligent Systems, 21(3):96–101, 2006.
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB, pages 302–314, 1999.
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proc. of PODS, 2002.
L. Sheng, Z. M. Ozsoyoglu, and G. Ozsoyoglu. A graph query language and its query processing. In ICDE, 1999.
Y. Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel. SAGA: a subgraph matching tool for biological graphs. Bioinformatics, 23(2), 2007.
S. Trißl and U. Leser. Fast and practical indexing and querying of very large graphs. In Proc. of SIGMOD ’07, pages 845–856, 2007.
H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In Proc. of ICDE ’06, page 75, 2006.
D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, 2007.
X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, 2004.
S. Zhang, M. Hu, and J. Yang. TreePi: A novel graph indexing method. In ICDE, 2007.
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In Proc. of VLDB, pages 938–949, 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
He, H., Singh, A.K. (2010). Query Language and Access Methods for Graph Databases. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)