Linked Data Management

Hauswirth, Manfred; Wylot, Marcin; Grund, Martin; Groth, Paul; Cudré-Mauroux, Philippe

doi:10.1007/978-3-319-49340-4_9

Linked Data Management

Manfred Hauswirth³,
Marcin Wylot³,
Martin Grund³,
Paul Groth³ &
…
Philippe Cudré-Mauroux³

Chapter
First Online: 26 February 2017

7294 Accesses
1 Citations

Abstract

The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.w3.org/Addressing/.
2.
http://www.w3.org/Protocols/.
3.
http://www.w3.org/RDF/.
4.
http://www.w3.org/TR/sparql11-query/.
5.
https://www.w3.org/TR/rdf11-concepts/#section-dataset.
6.
In databases indexes are used to locate data without scanning the entire dataspace, thus to improve the speed of retrieval operations. For more details about database concepts we refer the reader to the positions in the literature introduced in Sect. 2.
7.
Optimization statistics are used in databases to choose the best execution plan for a query. They contain information describing distribution of various objects in a database.
8.
From the database perspective a context can be seen as a graph, thus there is no difference between those two concepts in terms of data management. Here, we keep the original terminology of the authors of each approach.
9.
http://bmagic.sourceforge.net/dGap.html.
10.
https://jena.apache.org/documentation/tdb/.
11.
http://dublincore.org/documents/dcmi-terms/.
12.
http://inference-web.org/wiki/PML_3.0.
13.
http://www.w3.org/TR/hcls-dataset/.

References

D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007 (ACM, 2007), pp. 411–422
Google Scholar
K. Alexander, M. Hausenblas, Describing linked datasets — on the design and usage of void, the vocabulary of interlinked datasets, in In Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009). http://richard.cyganiak.de/2008/papers/void-ldow2009.pdf
M. Atre, V. Chaoji, M.J. Zaki, J.A. Hendler, Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data, in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010 (ACM, 2010), pp. 41–50
Google Scholar
M. Atre, J.A. Hendler, BitMat: a main memory bit-matrix of RDF triples, in The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (Citeseer, 2009), p. 33
Google Scholar
S. Auer, J. Demter, M. Martin, J. Lehmann, Lodstats-an extensible framework for high-performance dataset analytics, in Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 353–362
Google Scholar
T. Berners-Lee, Linked data-design issues (2006)
Google Scholar
T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web. Sci. Am. 284(5), 28–37 (2001)
Article Google Scholar
O. Biton, S. Cohen-Boulakia, S.B. Davidson, Zoom*userviews: querying relevant provenance in workflow systems, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (VLDB Endowment, 2007), pp. 1366–1369
Google Scholar
C. Bizer, A. Jentzsch, R. Cyganiak, State of the lod cloud. Version 0.3 (September 2011) 1803 (2011). http://lod-cloud.net/state/
M. Bröcheler, A. Pugliese, V. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDf databases, in The Semantic Web-ISWC 2009 (Springer, Berlin, 2009), pp. 97–113
Google Scholar
M. Bröcheler, A. Pugliese, V.S. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in Proceedings of the Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009 (Springer, Berlin, 2009), pp. 97–113
Google Scholar
J.J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, provenance and trust, in Proceedings of the 14th International Conference on World Wide Web (ACM, 2005), pp. 613–622
Google Scholar
A. Chebotko, S. Lu, X. Fei, F. Fotouhi, RDFProv: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl. Eng. 69(8), 836–865 (2010)
Article Google Scholar
J. Cheney, L. Chiticariu, W.C. Tan, Provenance in Databases: Why, How, and Where (Now Publishers Inc., Breda, 2009)
Google Scholar
P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A.J. Gray, C. Goble, T. Clark, Pav ontology: provenance, authoring and versioning. J. Biomed. Semant. 4(1), 1–22 (2013). doi:10.1186/2041-1480-4-37
Article Google Scholar
P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, T. Clark, The swan biomedical discourse ontology. J. Biomed. Inf. 41(5), 739–751 (2008). doi:10.1016/j.jbi.2008.04.010
Article Google Scholar
Consortium WWW, OWL 2 Web Ontology Language (2012)
Google Scholar
Consortium WWW, SPARQL 1.1 Overview (2013)
Google Scholar
Consortium WWW, RDF 1.1 Concepts and Abstract Syntax (2014)
Google Scholar
Consortium WWW, RDF 1.1: On Semantics of RDF Datasets (2014)
Google Scholar
Consortium WWW, RDF 1.1 Primer (2014)
Google Scholar
Consortium WWW, RDF Schema 1, 1 (2014)
Google Scholar
P. Cudré-Mauroux, H. Kimura, K.T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D.L. Wang, M. Balazinska, J. Becla, D.J. DeWitt, B. Heath, D. Maier, S. Madden, J.M. Patel, M. Stonebraker, S.B. Zdonik, A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)
Google Scholar
P. Cudré-Mauroux, E. Wu, S. Madden, The case for rodentstore, an adaptive, declarative storage system, in Biennial Conference on Innovative Data Systems Research (CIDR) (2009)
Google Scholar
P.P. da Silva, D.L. McGuinness, R. Fikes, A proof markup language for semantic web services. Inf. Syst. 31(4), 381–395 (2006). doi:10.1016/j.is.2005.02.003
Article Google Scholar
C.V. Damásio, A. Analyti, G. Antoniou, Provenance for sparql queries, in Proceedings of the 11th International Conference on The Semantic Web - Volume Part I, ISWC’12 (Springer, Berlin, 2012), pp. 625–640. doi:10.1007/978-3-642-35176-1_39
L. Ding, Y. Peng, P.P. da Silva, D.L. McGuinness, Tracking RDF graph provenance using RDF molecules, in International Semantic Web Conference (2005)
Google Scholar
O. Erling, I. Mikhailov, Towards web scale RDF, in Proceedings of the SSWS (2008)
Google Scholar
G.H.L. Fletcher, P.W. Beck, Scalable indexing of RDF graphs for efficient join processing, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2–6, 2009 (ACM, 2009), pp. 1513–1516
Google Scholar
G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, V. Christophides, Coloring RDF triples to capture provenance, in Proceedings of the 8th International Semantic Web Conference, ISWC ’09 (Springer, Berlin, 2009), pp. 196–212. doi:10.1007/978-3-642-04930-9_13
H. Garcia-Molina, Database Systems: The Complete Book (Pearson Education, India, 2008)
Google Scholar
F. Geerts, G. Karvounarakis, V. Christophides, I. Fundulaki, Algebraic structures for capturing the provenance of sparql queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13 (ACM, New York, 2013), pp. 153–164. doi:10.1145/2448496.2448516
Y. Gil, S. Miles, K. Belhajjame, H. Deus, D. Garijo, G. Klyne, P. Missier, S. Soiland-Reyes, S. Zednik (eds.), in PROV model primer. W3C Working Group Note NOTE-prov-primer-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/prov-primer/
B. Glavic, G. Alonso, The perm provenance management system in action, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09 (ACM, New York, NY, USA, 2009), pp. 1055–1058
Google Scholar
A.J. Gray, Dataset descriptions for linked data systems. IEEE Internet Comput. 18(4), 66–69 (2014). doi:10.1109/MIC.2014.66
Article Google Scholar
T.J. Green, G. Karvounarakis, V. Tannen, Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM, 2007), pp. 31–40
Google Scholar
P. Groth, A. Gibson, J. Velterop, The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). http://dl.acm.org/citation.cfm?id=1883685.1883690
P. Groth, Y. Gil, J. Cheney, S. Miles, Requirements for provenance on the web. Int. J. Digit. Curation 7(1), 39–56 (2012). doi:10.2218/ijdc.v7i1.213
Article Google Scholar
P. Groth, L. Moreau (eds.), PROV-overview. An overview of the PROV family of documents, in W3C Working Group Note NOTE-Prov-Overview-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
S. Harris, N. Lamb, N. Shadbolt, 4store: the design and implementation of a clustered rdf store, in 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (2009), pp. 94–109
Google Scholar
A. Harth, S. Decker, Optimized index structures for querying RDF from the web, in IEEE LA-WEB (2005), pp. 71–80
Google Scholar
O. Hartig, Provenance information in the web of data, in LDOW (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf
O. Hartig, Querying trust in RDF data with tSPARQL, in Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion (Springer, Berlin, 2009), pp. 5–20. doi:10.1007/978-3-642-02121-3_5
P. Hayes, B. McBride, RDF semantics, in W3C Recommendation (2004)
Google Scholar
T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool (Morgan & Claypool Publishers, 2011). doi:10.2200/S00334ED1V01Y201102WBE001
T. Heath, C. Bizer, Linked data: evolving the web into a global data space. Synth. Lectures Semant. Web: Theory technol. 1(1), 1–136 (2011)
Article Google Scholar
J.M. Hellerstein, M. Stonebraker, Readings in Database Systems (MIT Press, Cambridge, 2005)
Google Scholar
J. Hoffart, F.M. Suchanek, K. Berberich, G. Weikum, YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194(0), 28–61 (2013). doi:10.1016/j.artint.2012.06.001, http://www.sciencedirect.com/science/article/pii/S0004370212000719 (Artificial Intelligence, Wikipedia and Semi-Structured Resources)
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
M. Janik, K. Kochut, BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery, in Proceedings of the The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6–10, 2005 (Springer, Berlin, 2005), pp. 431–445
Google Scholar
G. Karvounarakis, Z.G. Ives, V. Tannen, Querying data provenance, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, 2010), pp. 951–962
Google Scholar
G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Article MathSciNet MATH Google Scholar
S. Miles, Electronically querying for the provenance of entities, in Provenance and Annotation of Data, vol. 4145, ed. by L. Moreau, I. Foster. Lecture Notes in Computer Science (Springer, Berlin, 2006), pp. 184–192. doi:10.1007/11890850_19
M. Luc, G. Paul, Provenance: An Introduction to PROV (Morgan and Claypool, 2013). http://eprints.soton.ac.uk/356858/
L. Moreau, The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010). doi:10.1561/1800000010, http://eprints.ecs.soton.ac.uk/21691/
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, J.V. den Bussche, The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011). doi:10.1016/j.future.2010.07.005, http://www.sciencedirect.com/science/article/pii/S0167739X10001275
L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV. Web Semant.: Sci. Serv. Agents World Wide Web 35, Part 4, 235–257 (2015). http://dx.doi.org/10.1016/j.websem.2015.04.001, http://www.sciencedirect.com/science/article/pii/S1570826815000177
T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. (PVLDB) 1(1), 647–659 (2008)
Article Google Scholar
T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
Article Google Scholar
V. Nguyen, O. Bodenreider, A. Sheth, Don’t like RDF reification? Making statements about statements using singleton property, in Proceedings of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2014), pp. 759–770
Google Scholar
X. Niu, R. Kapoor, B. Glavic, D. Gawlick, Z.H. Liu, V. Krishnaswamy, V. Radhakrishnan, Interoperability for provenance-aware databases using PROV and JSON, in Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance, TaPP’15 (USENIX Association, Berkeley, CA, USA, 2015), p. 6. http://dl.acm.org/citation.cfm?id=2814579.2814585
A. Owens, A. Seaborne, N. Gibbins, et al., Clustered TDB: a clustered triple store for jena (2008)
Google Scholar
E. Prud’Hommeaux, A. Seaborne, et al., Sparql query language for RDF. W3C Recommendation (2008)
Google Scholar
S.S. Sahoo, A. Sheth, Provenir ontology: towards a framework for escience provenance management, in Microsoft eScience Workshop (2009). http://knoesis.wright.edu/library/resource.php?id=741
M. Schmachtenberg, C. Bizer, H. Paulheim, Adoption of the linked data best practices in different topical domains, in The Semantic Web–ISWC 2014 (Springer, 2014), pp. 245–260
Google Scholar
Y. Theoharis, I. Fundulaki, G. Karvounarakis, V. Christophides, On provenance of queries on semantic web data. IEEE Internet Comput. 15(1), 31–39 (2011). doi:10.1109/MIC.2010.127
Article Google Scholar
O. Udrea, D.R. Recupero, V. Subrahmanian, Annotated RDF. ACM Trans. Comput. Log. (TOCL) 11(2), 10 (2010)
MathSciNet MATH Google Scholar
Y.R. Wang, S.E. Madnick, A polygon model for heterogeneous database systems: the source tagging perspective, in Proceedings of the Sixteenth International Conference on Very Large Databases (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990), pp. 519–533. http://dl.acm.org/citation.cfm?id=94362.94604
C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. (PVLDB) 1(1), 1008–1019 (2008). http://doi.acm.org/10.1145/1453856.1453965
M. Wylot, Efficient, scalable, and provenance-aware management of linked data. Ph.D. thesis, University of Fribourg (Switzerland) (2015)
Google Scholar
M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)
Google Scholar
M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793
Google Scholar
M. Wylot, P. Cudre-Mauroux, P. Groth, TripleProv: efficient processing of lineage queries in a native RDF store, in Proceedings of the 23rd International Conference on World Wide Web, WWW ’14. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014), pp. 455–466
Google Scholar
M. Wylot, P. Cudré-Mauroux, P. Groth, A demonstration of tripleprov: tracking and querying provenance over web data. Proc. VLDB Endow. 8(12), 1992–1995 (2015)
Article Google Scholar
M. Wylot, P. Cudré-Mauroux, P. Groth, Executing provenance-enabled queries over web data, in Proceedings of the 24rd International Conference on World Wide Web, WWW ’15. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015)
Google Scholar
J. Zhao, Guide to the Open Provenance Model Vocabulary (2010). http://open-biomed.sourceforge.net/opmv/opmv-guide.html
J. Zhao, C. Bizer, Y. Gil, P. Missier, S. Sahoo, Provenance requirements for the next version of RDF, in W3C Workshop RDF Next Steps (2010)
Google Scholar
A. Zimmermann, N. Lopes, A. Polleres, U. Straccia, A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant. 11, 72–95 (2012). doi:10.1016/j.websem.2011.08.006
L. Zou, J. Mo, L. Chen, M.T. Oezsu, D. Zhao, gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Berlin (TU Berlin), Berlin, Germany
Manfred Hauswirth, Marcin Wylot, Martin Grund, Paul Groth & Philippe Cudré-Mauroux

Authors

Manfred Hauswirth
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Wylot
View author publications
You can also search for this author in PubMed Google Scholar
Martin Grund
View author publications
You can also search for this author in PubMed Google Scholar
Paul Groth
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Cudré-Mauroux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Wylot .

Editor information

Editors and Affiliations

School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya
The School of Computer Science, The University of New South Wales, Eveleigh, New South Wales, Australia
Sherif Sakr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hauswirth, M., Wylot, M., Grund, M., Groth, P., Cudré-Mauroux, P. (2017). Linked Data Management. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-49340-4_9
Published: 26 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics