Abstract
The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
In databases indexes are used to locate data without scanning the entire dataspace, thus to improve the speed of retrieval operations. For more details about database concepts we refer the reader to the positions in the literature introduced in Sect. 2.
- 7.
Optimization statistics are used in databases to choose the best execution plan for a query. They contain information describing distribution of various objects in a database.
- 8.
From the database perspective a context can be seen as a graph, thus there is no difference between those two concepts in terms of data management. Here, we keep the original terminology of the authors of each approach.
- 9.
- 10.
- 11.
- 12.
- 13.
References
D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007 (ACM, 2007), pp. 411–422
K. Alexander, M. Hausenblas, Describing linked datasets — on the design and usage of void, the vocabulary of interlinked datasets, in In Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009). http://richard.cyganiak.de/2008/papers/void-ldow2009.pdf
M. Atre, V. Chaoji, M.J. Zaki, J.A. Hendler, Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data, in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010 (ACM, 2010), pp. 41–50
M. Atre, J.A. Hendler, BitMat: a main memory bit-matrix of RDF triples, in The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (Citeseer, 2009), p. 33
S. Auer, J. Demter, M. Martin, J. Lehmann, Lodstats-an extensible framework for high-performance dataset analytics, in Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 353–362
T. Berners-Lee, Linked data-design issues (2006)
T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web. Sci. Am. 284(5), 28–37 (2001)
O. Biton, S. Cohen-Boulakia, S.B. Davidson, Zoom*userviews: querying relevant provenance in workflow systems, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (VLDB Endowment, 2007), pp. 1366–1369
C. Bizer, A. Jentzsch, R. Cyganiak, State of the lod cloud. Version 0.3 (September 2011) 1803 (2011). http://lod-cloud.net/state/
M. Bröcheler, A. Pugliese, V. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDf databases, in The Semantic Web-ISWC 2009 (Springer, Berlin, 2009), pp. 97–113
M. Bröcheler, A. Pugliese, V.S. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in Proceedings of the Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009 (Springer, Berlin, 2009), pp. 97–113
J.J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, provenance and trust, in Proceedings of the 14th International Conference on World Wide Web (ACM, 2005), pp. 613–622
A. Chebotko, S. Lu, X. Fei, F. Fotouhi, RDFProv: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl. Eng. 69(8), 836–865 (2010)
J. Cheney, L. Chiticariu, W.C. Tan, Provenance in Databases: Why, How, and Where (Now Publishers Inc., Breda, 2009)
P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A.J. Gray, C. Goble, T. Clark, Pav ontology: provenance, authoring and versioning. J. Biomed. Semant. 4(1), 1–22 (2013). doi:10.1186/2041-1480-4-37
P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, T. Clark, The swan biomedical discourse ontology. J. Biomed. Inf. 41(5), 739–751 (2008). doi:10.1016/j.jbi.2008.04.010
Consortium WWW, OWL 2 Web Ontology Language (2012)
Consortium WWW, SPARQL 1.1 Overview (2013)
Consortium WWW, RDF 1.1 Concepts and Abstract Syntax (2014)
Consortium WWW, RDF 1.1: On Semantics of RDF Datasets (2014)
Consortium WWW, RDF 1.1 Primer (2014)
Consortium WWW, RDF Schema 1, 1 (2014)
P. Cudré-Mauroux, H. Kimura, K.T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D.L. Wang, M. Balazinska, J. Becla, D.J. DeWitt, B. Heath, D. Maier, S. Madden, J.M. Patel, M. Stonebraker, S.B. Zdonik, A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)
P. Cudré-Mauroux, E. Wu, S. Madden, The case for rodentstore, an adaptive, declarative storage system, in Biennial Conference on Innovative Data Systems Research (CIDR) (2009)
P.P. da Silva, D.L. McGuinness, R. Fikes, A proof markup language for semantic web services. Inf. Syst. 31(4), 381–395 (2006). doi:10.1016/j.is.2005.02.003
C.V. Damásio, A. Analyti, G. Antoniou, Provenance for sparql queries, in Proceedings of the 11th International Conference on The Semantic Web - Volume Part I, ISWC’12 (Springer, Berlin, 2012), pp. 625–640. doi:10.1007/978-3-642-35176-1_39
L. Ding, Y. Peng, P.P. da Silva, D.L. McGuinness, Tracking RDF graph provenance using RDF molecules, in International Semantic Web Conference (2005)
O. Erling, I. Mikhailov, Towards web scale RDF, in Proceedings of the SSWS (2008)
G.H.L. Fletcher, P.W. Beck, Scalable indexing of RDF graphs for efficient join processing, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2–6, 2009 (ACM, 2009), pp. 1513–1516
G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, V. Christophides, Coloring RDF triples to capture provenance, in Proceedings of the 8th International Semantic Web Conference, ISWC ’09 (Springer, Berlin, 2009), pp. 196–212. doi:10.1007/978-3-642-04930-9_13
H. Garcia-Molina, Database Systems: The Complete Book (Pearson Education, India, 2008)
F. Geerts, G. Karvounarakis, V. Christophides, I. Fundulaki, Algebraic structures for capturing the provenance of sparql queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13 (ACM, New York, 2013), pp. 153–164. doi:10.1145/2448496.2448516
Y. Gil, S. Miles, K. Belhajjame, H. Deus, D. Garijo, G. Klyne, P. Missier, S. Soiland-Reyes, S. Zednik (eds.), in PROV model primer. W3C Working Group Note NOTE-prov-primer-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/prov-primer/
B. Glavic, G. Alonso, The perm provenance management system in action, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09 (ACM, New York, NY, USA, 2009), pp. 1055–1058
A.J. Gray, Dataset descriptions for linked data systems. IEEE Internet Comput. 18(4), 66–69 (2014). doi:10.1109/MIC.2014.66
T.J. Green, G. Karvounarakis, V. Tannen, Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM, 2007), pp. 31–40
P. Groth, A. Gibson, J. Velterop, The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). http://dl.acm.org/citation.cfm?id=1883685.1883690
P. Groth, Y. Gil, J. Cheney, S. Miles, Requirements for provenance on the web. Int. J. Digit. Curation 7(1), 39–56 (2012). doi:10.2218/ijdc.v7i1.213
P. Groth, L. Moreau (eds.), PROV-overview. An overview of the PROV family of documents, in W3C Working Group Note NOTE-Prov-Overview-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
S. Harris, N. Lamb, N. Shadbolt, 4store: the design and implementation of a clustered rdf store, in 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (2009), pp. 94–109
A. Harth, S. Decker, Optimized index structures for querying RDF from the web, in IEEE LA-WEB (2005), pp. 71–80
O. Hartig, Provenance information in the web of data, in LDOW (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf
O. Hartig, Querying trust in RDF data with tSPARQL, in Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion (Springer, Berlin, 2009), pp. 5–20. doi:10.1007/978-3-642-02121-3_5
P. Hayes, B. McBride, RDF semantics, in W3C Recommendation (2004)
T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool (Morgan & Claypool Publishers, 2011). doi:10.2200/S00334ED1V01Y201102WBE001
T. Heath, C. Bizer, Linked data: evolving the web into a global data space. Synth. Lectures Semant. Web: Theory technol. 1(1), 1–136 (2011)
J.M. Hellerstein, M. Stonebraker, Readings in Database Systems (MIT Press, Cambridge, 2005)
J. Hoffart, F.M. Suchanek, K. Berberich, G. Weikum, YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194(0), 28–61 (2013). doi:10.1016/j.artint.2012.06.001, http://www.sciencedirect.com/science/article/pii/S0004370212000719 (Artificial Intelligence, Wikipedia and Semi-Structured Resources)
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
M. Janik, K. Kochut, BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery, in Proceedings of the The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6–10, 2005 (Springer, Berlin, 2005), pp. 431–445
G. Karvounarakis, Z.G. Ives, V. Tannen, Querying data provenance, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, 2010), pp. 951–962
G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
S. Miles, Electronically querying for the provenance of entities, in Provenance and Annotation of Data, vol. 4145, ed. by L. Moreau, I. Foster. Lecture Notes in Computer Science (Springer, Berlin, 2006), pp. 184–192. doi:10.1007/11890850_19
M. Luc, G. Paul, Provenance: An Introduction to PROV (Morgan and Claypool, 2013). http://eprints.soton.ac.uk/356858/
L. Moreau, The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010). doi:10.1561/1800000010, http://eprints.ecs.soton.ac.uk/21691/
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, J.V. den Bussche, The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011). doi:10.1016/j.future.2010.07.005, http://www.sciencedirect.com/science/article/pii/S0167739X10001275
L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV. Web Semant.: Sci. Serv. Agents World Wide Web 35, Part 4, 235–257 (2015). http://dx.doi.org/10.1016/j.websem.2015.04.001, http://www.sciencedirect.com/science/article/pii/S1570826815000177
T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. (PVLDB) 1(1), 647–659 (2008)
T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
V. Nguyen, O. Bodenreider, A. Sheth, Don’t like RDF reification? Making statements about statements using singleton property, in Proceedings of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2014), pp. 759–770
X. Niu, R. Kapoor, B. Glavic, D. Gawlick, Z.H. Liu, V. Krishnaswamy, V. Radhakrishnan, Interoperability for provenance-aware databases using PROV and JSON, in Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance, TaPP’15 (USENIX Association, Berkeley, CA, USA, 2015), p. 6. http://dl.acm.org/citation.cfm?id=2814579.2814585
A. Owens, A. Seaborne, N. Gibbins, et al., Clustered TDB: a clustered triple store for jena (2008)
E. Prud’Hommeaux, A. Seaborne, et al., Sparql query language for RDF. W3C Recommendation (2008)
S.S. Sahoo, A. Sheth, Provenir ontology: towards a framework for escience provenance management, in Microsoft eScience Workshop (2009). http://knoesis.wright.edu/library/resource.php?id=741
M. Schmachtenberg, C. Bizer, H. Paulheim, Adoption of the linked data best practices in different topical domains, in The Semantic Web–ISWC 2014 (Springer, 2014), pp. 245–260
Y. Theoharis, I. Fundulaki, G. Karvounarakis, V. Christophides, On provenance of queries on semantic web data. IEEE Internet Comput. 15(1), 31–39 (2011). doi:10.1109/MIC.2010.127
O. Udrea, D.R. Recupero, V. Subrahmanian, Annotated RDF. ACM Trans. Comput. Log. (TOCL) 11(2), 10 (2010)
Y.R. Wang, S.E. Madnick, A polygon model for heterogeneous database systems: the source tagging perspective, in Proceedings of the Sixteenth International Conference on Very Large Databases (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990), pp. 519–533. http://dl.acm.org/citation.cfm?id=94362.94604
C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. (PVLDB) 1(1), 1008–1019 (2008). http://doi.acm.org/10.1145/1453856.1453965
M. Wylot, Efficient, scalable, and provenance-aware management of linked data. Ph.D. thesis, University of Fribourg (Switzerland) (2015)
M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)
M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793
M. Wylot, P. Cudre-Mauroux, P. Groth, TripleProv: efficient processing of lineage queries in a native RDF store, in Proceedings of the 23rd International Conference on World Wide Web, WWW ’14. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014), pp. 455–466
M. Wylot, P. Cudré-Mauroux, P. Groth, A demonstration of tripleprov: tracking and querying provenance over web data. Proc. VLDB Endow. 8(12), 1992–1995 (2015)
M. Wylot, P. Cudré-Mauroux, P. Groth, Executing provenance-enabled queries over web data, in Proceedings of the 24rd International Conference on World Wide Web, WWW ’15. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015)
J. Zhao, Guide to the Open Provenance Model Vocabulary (2010). http://open-biomed.sourceforge.net/opmv/opmv-guide.html
J. Zhao, C. Bizer, Y. Gil, P. Missier, S. Sahoo, Provenance requirements for the next version of RDF, in W3C Workshop RDF Next Steps (2010)
A. Zimmermann, N. Lopes, A. Polleres, U. Straccia, A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant. 11, 72–95 (2012). doi:10.1016/j.websem.2011.08.006
L. Zou, J. Mo, L. Chen, M.T. Oezsu, D. Zhao, gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Hauswirth, M., Wylot, M., Grund, M., Groth, P., Cudré-Mauroux, P. (2017). Linked Data Management. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-49340-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)