skip to main content
10.1145/2736277.2741143acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Executing Provenance-Enabled Queries over Web Data

Published: 18 May 2015 Publication History

Abstract

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.

References

[1]
B. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for queries, updates, and transactions. In 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2014), Cologne, June 2014. USENIX Association.
[2]
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. SIGMOD Rec., 29(2):261--272, May 2000.
[3]
C. R. Batchelor, C. Y. A. Brenninkmeijer, C. Chichester, M. Davies, D. Digles, I. Dunlop, C. T. A. Evelo, A. Gaulton, C. A. Goble, A. J. G. Gray, P. T. Groth, L. Harland, K. Kara- petyan, A. Loizou, J. P. Overington, S. Pettifer, J. Steele, R. Stevens, V. Tkachenko, A. Waagmeester, A. J. Williams, and E. L. Willighagen. Scientific lenses to support multiple views over linked chemistry data. In ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, pages 98--113, Oct. 2014.
[4]
O. Biton, S. Cohen-Boulakia, and S. B. Davidson. Zoom*userviews: Querying relevant provenance in work flow systems. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB '07, pages 1366--1369. VLDB Endowment, 2007.
[5]
J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs, provenance and trust. In Proceedings of the 14th International Conference on World Wide Web, WWW '05, pages 613--622, New York, NY, USA, 2005. ACM.
[6]
A. Chebotko, S. Lu, X. Fei, and F. Fotouhi. Rdfprov: A relational rdf store for querying and managing scientific work flow provenance. Data Knowl. Eng., 69(8):836--865, Aug. 2010.
[7]
C. Chichester, P. Gaudet, O. Karch, P. Groth, L. Lane, A. Bairoch, B. Mons, and A. Loizou. Querying neXtProt nanopublications and their value for insights on sequence variants and tissue expression. Web Semantics: Science, Services and Agents on the World Wide Web, (0):--, 2014.
[8]
R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. SIGMOD Rec., 23(2):150--160, May 1994.
[9]
C. V. Damasio, A. Analyti, and G. Antoniou. Provenance for SPARQL queries. In Proceedings of the 11th international conference on The Semantic Web - Volume Part I, ISWC'12, pages 625--640, Berlin, Heidelberg, 2012. Springer-Verlag.
[10]
G. Demartini, I. Enchev, M. Wylot, J. Gapany, and P. Cudre-Mauroux. BowlognaBench -- TBenchmarking RDF Analytics. In K. Aberer, E. Damiani, and T. Dillon, editors, Data-Driven Process Discovery and Analysis, volume 116 of Lecture Notes in Business Information Processing, pages 82--102. Springer Berlin Heidelberg, 2012.
[11]
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. PVLDB, 7(4):277--288, 2013.
[12]
G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, and V. Christophides. Coloring rdf triples to capture provenance. In Proceedings of the 8th International Semantic Web Conference, ISWC '09, pages 196--212, Berlin, Heidelberg, 2009. Springer-Verlag.
[13]
F. Geerts, G. Karvounarakis, V. Christophides, and I. Fun- dulaki. Algebraic structures for capturing the provenance of sparql queries. In Proceedings of the 16th International Conference on Database Theory, ICDT '13, pages 153--164, New York, NY, USA, 2013. ACM.
[14]
B. Glavic and G. Alonso. The perm provenance management system in action. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 1055--1058, New York, NY, USA, 2009. ACM.
[15]
G. Graefe and K. Ward. Dynamic query evaluation plans. SIGMOD Rec., 18(2):358--366, June 1989.
[16]
P. Groth and L. Moreau (eds.). PROV-Overview. An Overview of the PROV Family of Documents. W3C Working Group Note NOTE-prov-overview-20130430, World Wide Web Consortium, Apr. 2013.
[17]
M. Grund, J. Kruger, H. Plattner, A. Zeier, P. Cudré- Mauroux, and S. Madden. HYRISE - A main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.
[18]
H. Halpin and J. Cheney. Dynamic Provenance for SPARQL Updates. In The Semantic Web -- ISWC 2014, volume 8796 of Lecture Notes in Computer Science, pages 425--440. Springer International Publishing, 2014.
[19]
R. Hasan and F. Gandon. Predicting SPARQL query performance. In The Semantic Web: ESWC 2014 Satellite Events - ESWC 2014 Satellite Events, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers, pages 222--225, 2014.
[20]
T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, 2011.
[21]
N. Kabra and D. J. DeWitt. Efficient mid-query reoptimization of sub-optimal query execution plans. SIGMOD Rec., 27(2):106--117, June 1998.
[22]
G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 951--962. ACM, 2010.
[23]
C. A. Knoblock, P. Szekely, J. L. Ambite, S. Gupta, A. Goel, M. Muslea, K. Lerman, M. Taheriyan, and P. Mallick. Semiautomatically mapping structured sources into the semantic web. In Proceedings of the Extended Semantic Web Conference, Crete, Greece, 2012.
[24]
S. Miles. Electronically querying for the provenance of entities. In L. Moreau and I. Foster, editors, Provenance and Annotation of Data, volume 4145 of Lecture Notes in Computer Science, pages 184--192. Springer Berlin Heidelberg, 2006.
[25]
L. Moreau and G. Paul. Provenance: An Introduction to PROV. Morgan and Claypool, September 2013.
[26]
H. Muhleisen and C. Bizer. Web data commons - extracting structured data from two large web corpora. In C. Bizer, T. Heath, T. Berners-Lee, and M. Hausenblas, editors, LDOW, volume 937 of CEUR Workshop Proceedings. CEUR-WS.org, 2012.
[27]
T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 627--640. ACM, 2009.
[28]
K. Ng, Z. Wang, R. R. Muntz, and S. Nittel. Dynamic query re-optimization. In Scientific and Statistical Database Management, 1999. Eleventh International Conference on, pages 264--273, Aug 1999.
[29]
D. W. R. Cyganiak and M. L. (Ed.). RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation, February 2014. http://www.w3.org/TR/rdf11-concepts/.
[30]
M. Schmachtenberg, C. Bizer, and H. Paulheim. Adoption of the linked data best practices in different topical domains. In The Semantic Web--ISWC 2014, pages 245--260. Springer, 2014.
[31]
A. Schultz, A. Matteini, R. Isele, C. Bizer, and C. Becker. LDIF - Linked Data Integration Framework. In COLD, 2011.
[32]
O. Udrea, D. R. Recupero, and V. Subrahmanian. Annotated rdf. ACM Transactions on Computational Logic (TOCL), 11(2):10, 2010.
[33]
M. Wylot, P. Cudre-Mauroux, and P. Groth. Tripleprov: Efficient processing of lineage queries in a native rdf store. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pages 455--466, Republic and Canton of Geneva, Switzerland, 2014. International World Wide Web Conferences Steering Committee.
[34]
M. Wylot, J. Pont, M. Wisniewski, and P. Cudre-Mauroux. dipLODocus{RDF}: short and long-tail RDF analytics for massive webs of data. In Proceedings of the 10th international conference on The semantic web - Volume Part I, ISWC'11, pages 778--793, Berlin, Heidelberg, 2011. Springer-Verlag.
[35]
A. Zimmermann, N. Lopes, A. Polleres, and U. Straccia. A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant., 11:72--95, Mar. 2012.

Cited By

View all
  • (2023)Distributed Subweb Specifications for Traversing the WebTheory and Practice of Logic Programming10.1017/S1471068423000054(1-27)Online publication date: 25-Apr-2023
  • (2022)Link Traversal with Distributed Subweb SpecificationsRules and Reasoning10.1007/978-3-030-91167-6_5(62-79)Online publication date: 1-Jan-2022
  • (2020)State-of-the-Art Approaches for Meta-Knowledge Assertion in the Web of DataIETE Technical Review10.1080/02564602.2020.1819891(1-38)Online publication date: 23-Sep-2020
  • Show More Cited By

Index Terms

  1. Executing Provenance-Enabled Queries over Web Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '15: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1460 pages
      ISBN:9781450334693

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      Published: 18 May 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. RDF
      2. RDF data management
      3. linked data
      4. provenance
      5. provenance queries
      6. web data

      Qualifiers

      • Research-article

      Funding Sources

      • Dutch national program
      • Swiss National Science Foundation

      Conference

      WWW '15
      Sponsor:
      • IW3C2

      Acceptance Rates

      WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Distributed Subweb Specifications for Traversing the WebTheory and Practice of Logic Programming10.1017/S1471068423000054(1-27)Online publication date: 25-Apr-2023
      • (2022)Link Traversal with Distributed Subweb SpecificationsRules and Reasoning10.1007/978-3-030-91167-6_5(62-79)Online publication date: 1-Jan-2022
      • (2020)State-of-the-Art Approaches for Meta-Knowledge Assertion in the Web of DataIETE Technical Review10.1080/02564602.2020.1819891(1-38)Online publication date: 23-Sep-2020
      • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
      • (2018)A survey of simulation provenance systemsHuman-centric Computing and Information Sciences10.1186/s13673-018-0150-98:1(1-29)Online publication date: 1-Dec-2018
      • (2018)Provenance Management for Linked DataLinked Data10.1007/978-3-319-73515-3_8(181-195)Online publication date: 2-Mar-2018
      • (2018)Answering Provenance-Aware Queries on RDF Data Cubes Under Memory BudgetsThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_32(547-565)Online publication date: 18-Sep-2018
      • (2017)Linked data processing provenanceProceedings of the International Conference on Web Intelligence10.1145/3106426.3106495(88-96)Online publication date: 23-Aug-2017
      • (2017)Storing, Tracking, and Querying Provenance in Linked DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269029929:8(1751-1764)Online publication date: 1-Aug-2017
      • (2017)Linked Data ManagementHandbook of Big Data Technologies10.1007/978-3-319-49340-4_9(307-338)Online publication date: 26-Feb-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media