ABSTRACT
The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a non-standard manner. We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.
As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.
- S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for semistructured data. Int. J. on Digital Libraries, 1(1):68--88, 1997.Google ScholarCross Ref
- S. Abiteboul and V. Vianu. Regular path queries with constraints. J. Comput. Syst. Sci., 58(3):428--452, 1999. Google ScholarDigital Library
- F. Alkhateeb, J.-F. Baget, and J. Euzenat. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem., 7(2):57--73, 2009. Google ScholarDigital Library
- C. Álvarez and B. Jenner. A very hard log-space counting class. Theor. Comput. Sci., 107:3--30, 1993. Google ScholarDigital Library
- M. Arenas, S. Conca, and J. Pérez. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent the adoption of the standard. In World Wide Web Conference (WWW), 2012. To appear. Google ScholarDigital Library
- M. Arenas and J. Pérez. Querying semantic web data with SPARQL. In Principles of Database Systems (PODS), p. 305--316, 2011. Google ScholarDigital Library
- C. Berge. Graphs and Hypergraphs. North-Holland Publishing Company, 1973. Google ScholarDigital Library
- G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Trans. Database Syst., 2010. Google ScholarDigital Library
- R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Trans. Comput., 20:149--153, 1971. Google ScholarDigital Library
- P. Buneman, S. B. Davidson, G. G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In SIGMOD Conference, p. 505--516, 1996. Google ScholarDigital Library
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Containment of conjunctive regular path queries with inverse. In Principles of Knowledge Representation and Reasoning (KR), p. 176--185, 2000.Google Scholar
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. View-based query processing for regular path queries with inverse. In Principles of Database Systems (PODS), pages 58--66, 2000. Google ScholarDigital Library
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M.Y. Vardi. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci., 64(3):443--465, 2002.Google ScholarDigital Library
- D. Colazzo, G. Ghelli, and C. Sartiani. Efficient asymmetric inclusion between regular expression types. In International Conference on Database Theory (ICDT), pages 174--182, 2009. Google ScholarDigital Library
- D. Colazzo, G. Ghelli, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. Information Systems, 34(7):643--656, 2009. Google ScholarDigital Library
- M. P. Consens and A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In Principles of Database Systems (PODS), p. 404--416, 1990. Google ScholarDigital Library
- I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A graphical query language supporting recursion. In SIGMOD Conference, p. 323--330, 1987. Google ScholarDigital Library
- A. Deutsch and V. Tannen. Optimization properties for classes of conjunctive regular path queries. In Database Programming Languages (DBPL), p. 21--39, 2001. Google ScholarDigital Library
- M. F. Fernández, D. Florescu, A. Y. Levy, and D. Suciu. Declarative specification of web sites with strudel. VLDB J., 9(1):38--55, 2000. Google ScholarDigital Library
- D. Florescu, A. Y. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Principles of Database Systems (PODS), p. 139--148, 1998. Google ScholarDigital Library
- S. Gao, C. M. Sperberg-McQueen, H.S. Thompson, N. Mendelsohn, D. Beech, and M. Maloney. W3C XML Schema Definition Language (XSD) 1.1 part 1: Structures. Tech. report, World Wide Web Consortium, April 2009.Google Scholar
- W. Gelade, M. Gyssens, and W. Martens. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput., 41(1):160--190, 2012. Google ScholarDigital Library
- W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput., 38(5), 2009. Google ScholarDigital Library
- V. M. Glushkov. The abstract theory of automata. Russian Math. Surveys, 16(5(101)):1--53, 1961.Google Scholar
- S. Harris and A. Seaborne. SPARQL 1.1 query language. Tech. report, World Wide Web Consortium (W3C), January2012.Google Scholar
- J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. Google ScholarDigital Library
- S. Kannan, Z. Sweedyk, and S. R. Mahaney. Counting and random generation of strings in regular languages. In Symp.\ on Discrete Algorithms (SODA), p. 551--557, 1995. Google ScholarDigital Library
- P. Kilpeläinen and R. Tuhkanen. Regular expressions with numerical occurrence indicators -- preliminary results. In Symp. on Prog. Lang. and Software Tools (SPLST), p. 163--173, 2003.Google Scholar
- P. Kilpeläinen and R. Tuhkanen. One-unambiguity of regular expressions with numeric occurrence indicators. Information and Computation, 205(6):890--916, 2007. Google ScholarDigital Library
- S. C. Kleene. Automata Studies, chapter Representations of events in nerve sets and finite automata, p. 3--42. Princeton Univ. Press, 1956.Google Scholar
- L. Libkin and D. Vrgoc. Regular path queries on graphs with data. In International Conference on Database Theory (ICDT),2012. To appear. Google ScholarDigital Library
- Y. A. Liu and F. Yu. Solving regular path queries. In Intl. Conf. on Mathematics of Program Construction (MPC), p. 195--208, 2002. Google ScholarDigital Library
- W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for simple regular expressions. In Mathematical Foundations of Computer Science (MFCS), p. 889--900, 2004.Google ScholarCross Ref
- W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486--1530, 2009. Google ScholarDigital Library
- A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. Comput., 24(6):1235--1258, 1995. Google ScholarDigital Library
- J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst., 34(3), 2009. Google ScholarDigital Library
- J. Pérez, M. Arenas, and C. Gutierrez. nSPARQL: A navigational language for RDF. J. Web Sem., 8(4):255--270, 2010. Google ScholarDigital Library
- M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In International Conference on Database Theory (ICDT), pages 4--33, 2010. Google ScholarDigital Library
- L. Stockmeyer. The complexity of decision problems in automata theory and logic. PhD thesis, Massachusetts Institute of Technology, 1974.Google Scholar
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8(3):410--421, 1979.Google ScholarDigital Library
- M. Yannakakis. Graph-theoretic methods in database theory. In Principles of Database Systems (PODS), p. 230--242, 1990. Google ScholarDigital Library
Index Terms
The complexity of evaluating path expressions in SPARQL
Recommendations
The complexity of regular expressions and property paths in SPARQL
Invited papers issueThe World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard ...
Processing SPARQL queries with regular expressions in RDF databases
DTMBIO '10: Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informaticsAs the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL -- a W3C recommendation query ...
From regular expressions to smaller NFAs
Several methods have been developed to construct @l-free automata that represent a regular expression. Among the most widely known are the position automaton (Glushkov), the partial derivatives automaton (Antimirov) and the follow automaton (Ilie and Yu)...
Comments