skip to main content
10.1145/2566486.2568014acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

TripleProv: efficient processing of lineage queries in a native RDF store

Published: 07 April 2014 Publication History

Abstract

Given the heterogeneity of the data one can find on the Linked Data cloud, being able to trace back the provenance of query results is rapidly becoming a must-have feature of RDF systems. While provenance models have been extensively discussed in recent years, little attention has been given to the efficient implementation of provenance-enabled queries inside data stores. This paper introduces TripleProv: a new system extending a native RDF store to efficiently handle such queries. TripleProv implements two different storage models to physically co-locate lineage and instance data, and for each of them implements algorithms for tracing provenance at two granularity levels. In the following, we present the overall architecture of our system, its different lineage storage models, and the various query execution strategies we have implemented to efficiently answer provenance-enabled queries. In addition, we present the results of a comprehensive empirical evaluation of our system over two different datasets and workloads.

References

[1]
J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs, provenance and trust. In Proceedings of the 14th international conference on World Wide Web, pages 613--622. ACM, 2005.
[2]
J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databases: Why, how, and where, volume 1. Now Publishers Inc, 2009.
[3]
P. Cudré-Mauroux, K. Lim, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. M. Patel, M. Stonebraker, and S. Zdonik. A Demonstration of SciDB: A Science-Oriented DBMS. Proceedings of the VLDB Endowment (PVLDB), 2(2):1534--1537, 2009.
[4]
C. V. Damásio, A. Analyti, and G. Antoniou. Provenance for sparql queries. In Proceedings of the 11th international conference on The Semantic Web - Volume Part I, ISWC'12, pages 625--640, Berlin, Heidelberg, 2012. Springer-Verlag.
[5]
G. Demartini, I. Enchev, M. Wylot, J. Gapany, and P. Cudre-Mauroux. Bowlognabencha A Tbenchmarking rdf analytics. In K. Aberer, E. Damiani, and T. Dillon, editors, Data-Driven Process Discovery and Analysis, volume 116 of Lecture Notes in Business Information Processing, pages 82--102. Springer Berlin Heidelberg, 2012.
[6]
L. Ding, Y. Peng, P. P. da Silva, and D. L. McGuinness. Tracking RDF Graph Provenance using RDF Molecules. In International Semantic Web Conference, 2005.
[7]
G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, and V. Christophides. Coloring rdf triples to capture provenance. In Proceedings of the 8th International Semantic Web Conference, ISWC '09, pages 196--212, Berlin, Heidelberg, 2009. Springer-Verlag.
[8]
F. Geerts, G. Karvounarakis, V. Christophides, and I. Fundulaki. Algebraic structures for capturing the provenance of sparql queries. In Proceedings of the 16th International Conference on Database Theory, ICDT '13, pages 153--164, New York, NY, USA, 2013. ACM.
[9]
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 31--40. ACM, 2007.
[10]
P. Groth, Y. Gil, J. Cheney, and S. Miles. Requirements for provenance on the web. International Journal of Digital Curation, 7(1), 2012.
[11]
P. Groth and L. Moreau (eds.). PROV-Overview. An Overview of the PROV Family of Documents. W3C Working Group Note NOTE-prov-overview-20130430, World Wide Web Consortium, Apr. 2013.
[12]
P. T. Groth. Transparency and reliability in the data supply chain. IEEE Internet Computing, 17(2):69--71, 2013.
[13]
O. Hartig. Provenance information in the web of data. In Proceedings of the 2nd Workshop on Linked Data on the Web (LDOW2009), 2009.
[14]
O. Hartig. Querying trust in rdf data with tsparql. In Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion, pages 5--20, Berlin, Heidelberg, 2009. Springer-Verlag.
[15]
P. Hayes and B. McBride. Rdf semantics. W3C Recommendation, February 2004.
[16]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194(0):28 -- 61, 2013. Artificial Intelligence, Wikipedia and Semi-Structured Resources.
[17]
T. D. Huynh, P. Groth, and S. Zednik (eds.). PROV Implementation Report. W3C Working Group Note NOTE-prov-implementations-20130430, World Wide Web Consortium, Apr. 2013.
[18]
G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 951--962. ACM, 2010.
[19]
L. Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, 2(2--3):99--241, Nov. 2010.
[20]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. Van den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, 27(6):743--756, June 2011.
[21]
H. Muhleisen and C. Bizer. Web data commons - extracting structured data from two large web corpora. In C. Bizer, T. Heath, T. Berners-Lee, and M. Hausenblas, editors, LDOW, volume 937 of CEUR Workshop Proceedings. CEUR-WS.org, 2012.
[22]
T. Neumann and G. Weikum. Scalable join processing on very large rdf graphs. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 627--640. ACM, 2009.
[23]
P. Pediaditis, G. Flouris, I. Fundulaki, and V. Christophides. On explicit provenance management in rdf/s graphs. In Workshop on the Theory and Practice of Provenance, 2009.
[24]
S. Sahoo, P. Groth, O. Hartig, S. Miles, S. Coppens, J. Myers, Y. Gil, L. Moreau, J. Zhao, M. Panzer, et al. Provenance vocabulary mappings. Technical report, W3C, 2010.
[25]
Y. Theoharis, I. Fundulaki, G. Karvounarakis, and V. Christophides. On provenance of queries on semantic web data. IEEE Internet Computing, 15(1):31--39, Jan. 2011.
[26]
O. Udrea, D. R. Recupero, and V. Subrahmanian. Annotated rdf. ACM Transactions on Computational Logic (TOCL), 11(2):10, 2010.
[27]
K. Wilkinson and K. Wilkinson. Jena property table implementation. In International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), 2006.
[28]
M. Wylot, J. Pont, M. Wisniewski, and P. Cudré-Mauroux. diplodocus{rdf}: short and long-tail rdf analytics for massive webs of data. In Proceedings of the 10th international conference on The semantic web - Volume Part I, ISWC'11, pages 778--793, Berlin, Heidelberg, 2011. Springer-Verlag.
[29]
J. Zhao, C. Bizer, Y. Gil, P. Missier, and S. Sahoo. Provenance requirements for the next version of rdf. In W3C Workshop RDF Next Steps, 2010.
[30]
A. Zimmermann, N. Lopes, A. Polleres, and U. Straccia. A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant., 11:72--95, Mar. 2012.

Cited By

View all
  • (2024)NPCS: Native Provenance Computation for SPARQLProceedings of the ACM Web Conference 202410.1145/3589334.3645557(2085-2093)Online publication date: 13-May-2024
  • (2024)A survey for managing temporal data in RDFInformation Systems10.1016/j.is.2024.102368(102368)Online publication date: Feb-2024
  • (2023)Online maintenance of evolving knowledge graphs with RDFS-based saturation and why-provenance supportWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10079678:COnline publication date: 18-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDF
  2. linked data
  3. physical design
  4. provenance polynomials
  5. provenance queries
  6. triple store

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)NPCS: Native Provenance Computation for SPARQLProceedings of the ACM Web Conference 202410.1145/3589334.3645557(2085-2093)Online publication date: 13-May-2024
  • (2024)A survey for managing temporal data in RDFInformation Systems10.1016/j.is.2024.102368(102368)Online publication date: Feb-2024
  • (2023)Online maintenance of evolving knowledge graphs with RDFS-based saturation and why-provenance supportWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10079678:COnline publication date: 18-Oct-2023
  • (2021)Computing how-provenance for SPARQL queries via query rewritingProceedings of the VLDB Endowment10.14778/3484224.348423514:13(3389-3401)Online publication date: 28-Oct-2021
  • (2021)Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge GraphsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482330(545-554)Online publication date: 26-Oct-2021
  • (2021)A Delayed Instantiation Approach to Template-Driven Provenance for Electronic Health Record PhenotypingProvenance and Annotation of Data and Processes10.1007/978-3-030-80960-7_1(3-19)Online publication date: 9-Jul-2021
  • (2020)Dynamic Path-decomposed TriesACM Journal of Experimental Algorithmics10.1145/341803325(1-28)Online publication date: 30-Sep-2020
  • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
  • (2019)Implicit Bias in Crowdsourced Knowledge GraphsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317307(624-630)Online publication date: 13-May-2019
  • (2018)Conceptualizing a national data infrastructure for SwitzerlandInformation Polity10.3233/IP-17003323:1(43-65)Online publication date: 11-Feb-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media