skip to main content
10.1145/2457317.2457353acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

On implementing provenance-aware regular path queries with relational query engines

Published: 18 March 2013 Publication History

Abstract

Use of graphs is growing rapidly in social networks, semantic web, biological databases, scientific workflow provenance, and other areas. Regular Path Queries (RPQs) can be seen as a core graph query language to answer pattern-based reachability queries. Unfortunately, the number of freely available systems for querying graphs using RPQs is rather limited, and available implementations do not provide direct support for a number of desirable variants of RPQs, e.g., to return those edges that are contained in some (or all) paths that match the given regular expression R. Thus, by returning not just a pair (x, y) of end points of paths that match R, but also "witness edges" (u, v) inbetween, our RPQ variants can be understood as returning additional provenance information about the answer (x, y), i.e., those edges (u, v) that are in some (or all) paths from x to y matching R. We propose a number of such RPQ variants and show how they can be implemented using either Datalog or a suitable RDBMS. Our initial experimental results indicate that RPQs and our provenance-aware variants (RPQProv), when implemented using conventional relational technologies, yield reasonable performance even for relatively large graphs. On the other hand, the overhead associated with some of these variants also makes efficient handling of provenance-aware graph queries an interesting challenge for future research.

References

[1]
F. Afrati and F. Toni. Chain queries expressible by linear datalog programs. In Deductive Databases and Logic Programming (DDLP), pages 49--58, 1997.
[2]
F. N. Afrati and J. D. Ullman. Transitive closure and recursive datalog implemented on clusters. In Proceedings of the 15th International Conference on Extending Database Technology, EDBT '12, pages 132--143, New York, NY, USA, 2012. ACM.
[3]
M. Arenas, S. Conca, and J. Pérez. Counting beyond a yottabyte, or how sparql 1.1 property paths will prevent adoption of the standard. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 629--638, New York, NY, USA, 2012. ACM.
[4]
V. Cuevas-Vicenttín, S. Dey, B. Ludäscher, M. Li Yuan Wang, and T. Song. Modeling and querying scientific workflow provenance in the d-opm. In 7th Workshop on Workflows in Support of Large-Scale Science (WORKS), november 2012.
[5]
O. De Moor, D. Lacey, and E. Van Wyk. Universal regular path queries. Higher Order Symbol. Comput., 16(1-2):15--35, Mar. 2003.
[6]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008.
[7]
S. Dey, S. Köhler, S. Bowers, and B. Ludäscher. Datalog as a lingua franca for provenance querying and reasoning. In Workshop on the Theory and Practice of Provenance (TaPP), 2012.
[8]
W. Fan, X. Wang, and Y. Wu. Performance guarantees for distributed reachability queries. Proc. VLDB Endow., 5(11):1304--1316, July 2012.
[9]
R. Kabler, Y. E. Ioannidis, and M. J. Carey. Performance evaluation of algorithms for transitive closure. Inf. Syst., 17(5):415--441, Sept. 1992.
[10]
A. Koschmieder and U. Leser. Regular path queries on large graphs. In Proceedings of the 24th international conference on Scientific and Statistical Database Management, SSDBM'12, pages 177--194, Berlin, Heidelberg, 2012. Springer-Verlag.
[11]
Y. A. Liu, T. Rothamel, F. Yu, S. D. Stoller, and N. Hu. Parametric regular path queries. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, PLDI '04, pages 219--230, New York, NY, USA, 2004. ACM.
[12]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[13]
A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. Comput., 24(6):1235--1258, Dec. 1995.
[14]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The open provenance model core specification (v1.1). Future Gener. Comput. Syst., 27(6):743--756, June 2011.
[15]
J. D. Ullman and A. Van Gelder. Parallel complexity of logical query programs. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, SFCS '86, pages 438--454, Washington, DC, USA, 1986. IEEE Computer Society.
[16]
M. L. Y. Wang. Querying Provenance as Regular Path Queries with Relational Databases. Master's thesis, University of California, Davis, 2012.

Cited By

View all
  • (2024)Efficient Regular Simple Path Queries under Transitive Restricted ExpressionsProceedings of the VLDB Endowment10.14778/3654621.365463617:7(1710-1722)Online publication date: 1-Mar-2024
  • (2024)Materialized View Selection & View-Based Query Planning for Regular Path QueriesProceedings of the ACM on Management of Data10.1145/36549552:3(1-26)Online publication date: 30-May-2024
  • (2024)MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming GraphsProceedings of the ACM on Management of Data10.1145/36392602:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
March 2013
423 pages
ISBN:9781450315999
DOI:10.1145/2457317
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datalog
  2. graph querying
  3. regular path queries

Qualifiers

  • Research-article

Conference

EDBT/ICDT '13

Acceptance Rates

EDBT '13 Paper Acceptance Rate 7 of 10 submissions, 70%;
Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Regular Simple Path Queries under Transitive Restricted ExpressionsProceedings of the VLDB Endowment10.14778/3654621.365463617:7(1710-1722)Online publication date: 1-Mar-2024
  • (2024)Materialized View Selection & View-Based Query Planning for Regular Path QueriesProceedings of the ACM on Management of Data10.1145/36549552:3(1-26)Online publication date: 30-May-2024
  • (2024)MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming GraphsProceedings of the ACM on Management of Data10.1145/36392602:1(1-26)Online publication date: 26-Mar-2024
  • (2023)Integrating Connection Search in Graph Queries2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00200(2607-2620)Online publication date: Apr-2023
  • (2023)Optimizing RPQs over a compact graph representationThe VLDB Journal10.1007/s00778-023-00811-233:2(349-374)Online publication date: 7-Sep-2023
  • (2023)More Power to SPARQL: From Paths to TreesThe Semantic Web: ESWC 2023 Satellite Events10.1007/978-3-031-43458-7_6(32-36)Online publication date: 21-Oct-2023
  • (2022)Time- and Space-Efficient Regular Path Queries2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00277(3091-3105)Online publication date: May-2022
  • (2022)Provenance in Temporal Interaction Networks2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00216(2277-2290)Online publication date: May-2022
  • (2021)Distributed processing of regular path queries in RDF graphsKnowledge and Information Systems10.1007/s10115-020-01536-2Online publication date: 13-Jan-2021
  • (2021)A survey of RDF stores & SPARQL engines for querying knowledge graphsThe VLDB Journal10.1007/s00778-021-00711-331:3(1-26)Online publication date: 13-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media