Skip to main content

Linked Data Management

  • Chapter
  • First Online:

Abstract

The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.w3.org/Addressing/.

  2. 2.

    http://www.w3.org/Protocols/.

  3. 3.

    http://www.w3.org/RDF/.

  4. 4.

    http://www.w3.org/TR/sparql11-query/.

  5. 5.

    https://www.w3.org/TR/rdf11-concepts/#section-dataset.

  6. 6.

    In databases indexes are used to locate data without scanning the entire dataspace, thus to improve the speed of retrieval operations. For more details about database concepts we refer the reader to the positions in the literature introduced in Sect. 2.

  7. 7.

    Optimization statistics are used in databases to choose the best execution plan for a query. They contain information describing distribution of various objects in a database.

  8. 8.

    From the database perspective a context can be seen as a graph, thus there is no difference between those two concepts in terms of data management. Here, we keep the original terminology of the authors of each approach.

  9. 9.

    http://bmagic.sourceforge.net/dGap.html.

  10. 10.

    https://jena.apache.org/documentation/tdb/.

  11. 11.

    http://dublincore.org/documents/dcmi-terms/.

  12. 12.

    http://inference-web.org/wiki/PML_3.0.

  13. 13.

    http://www.w3.org/TR/hcls-dataset/.

References

  1. D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007 (ACM, 2007), pp. 411–422

    Google Scholar 

  2. K. Alexander, M. Hausenblas, Describing linked datasets — on the design and usage of void, the vocabulary of interlinked datasets, in In Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009). http://richard.cyganiak.de/2008/papers/void-ldow2009.pdf

  3. M. Atre, V. Chaoji, M.J. Zaki, J.A. Hendler, Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data, in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010 (ACM, 2010), pp. 41–50

    Google Scholar 

  4. M. Atre, J.A. Hendler, BitMat: a main memory bit-matrix of RDF triples, in The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (Citeseer, 2009), p. 33

    Google Scholar 

  5. S. Auer, J. Demter, M. Martin, J. Lehmann, Lodstats-an extensible framework for high-performance dataset analytics, in Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 353–362

    Google Scholar 

  6. T. Berners-Lee, Linked data-design issues (2006)

    Google Scholar 

  7. T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web. Sci. Am. 284(5), 28–37 (2001)

    Article  Google Scholar 

  8. O. Biton, S. Cohen-Boulakia, S.B. Davidson, Zoom*userviews: querying relevant provenance in workflow systems, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (VLDB Endowment, 2007), pp. 1366–1369

    Google Scholar 

  9. C. Bizer, A. Jentzsch, R. Cyganiak, State of the lod cloud. Version 0.3 (September 2011) 1803 (2011). http://lod-cloud.net/state/

  10. M. Bröcheler, A. Pugliese, V. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDf databases, in The Semantic Web-ISWC 2009 (Springer, Berlin, 2009), pp. 97–113

    Google Scholar 

  11. M. Bröcheler, A. Pugliese, V.S. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in Proceedings of the Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009 (Springer, Berlin, 2009), pp. 97–113

    Google Scholar 

  12. J.J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, provenance and trust, in Proceedings of the 14th International Conference on World Wide Web (ACM, 2005), pp. 613–622

    Google Scholar 

  13. A. Chebotko, S. Lu, X. Fei, F. Fotouhi, RDFProv: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl. Eng. 69(8), 836–865 (2010)

    Article  Google Scholar 

  14. J. Cheney, L. Chiticariu, W.C. Tan, Provenance in Databases: Why, How, and Where (Now Publishers Inc., Breda, 2009)

    Google Scholar 

  15. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A.J. Gray, C. Goble, T. Clark, Pav ontology: provenance, authoring and versioning. J. Biomed. Semant. 4(1), 1–22 (2013). doi:10.1186/2041-1480-4-37

    Article  Google Scholar 

  16. P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, T. Clark, The swan biomedical discourse ontology. J. Biomed. Inf. 41(5), 739–751 (2008). doi:10.1016/j.jbi.2008.04.010

    Article  Google Scholar 

  17. Consortium WWW, OWL 2 Web Ontology Language (2012)

    Google Scholar 

  18. Consortium WWW, SPARQL 1.1 Overview (2013)

    Google Scholar 

  19. Consortium WWW, RDF 1.1 Concepts and Abstract Syntax (2014)

    Google Scholar 

  20. Consortium WWW, RDF 1.1: On Semantics of RDF Datasets (2014)

    Google Scholar 

  21. Consortium WWW, RDF 1.1 Primer (2014)

    Google Scholar 

  22. Consortium WWW, RDF Schema 1, 1 (2014)

    Google Scholar 

  23. P. Cudré-Mauroux, H. Kimura, K.T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D.L. Wang, M. Balazinska, J. Becla, D.J. DeWitt, B. Heath, D. Maier, S. Madden, J.M. Patel, M. Stonebraker, S.B. Zdonik, A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)

    Google Scholar 

  24. P. Cudré-Mauroux, E. Wu, S. Madden, The case for rodentstore, an adaptive, declarative storage system, in Biennial Conference on Innovative Data Systems Research (CIDR) (2009)

    Google Scholar 

  25. P.P. da Silva, D.L. McGuinness, R. Fikes, A proof markup language for semantic web services. Inf. Syst. 31(4), 381–395 (2006). doi:10.1016/j.is.2005.02.003

    Article  Google Scholar 

  26. C.V. Damásio, A. Analyti, G. Antoniou, Provenance for sparql queries, in Proceedings of the 11th International Conference on The Semantic Web - Volume Part I, ISWC’12 (Springer, Berlin, 2012), pp. 625–640. doi:10.1007/978-3-642-35176-1_39

  27. L. Ding, Y. Peng, P.P. da Silva, D.L. McGuinness, Tracking RDF graph provenance using RDF molecules, in International Semantic Web Conference (2005)

    Google Scholar 

  28. O. Erling, I. Mikhailov, Towards web scale RDF, in Proceedings of the SSWS (2008)

    Google Scholar 

  29. G.H.L. Fletcher, P.W. Beck, Scalable indexing of RDF graphs for efficient join processing, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2–6, 2009 (ACM, 2009), pp. 1513–1516

    Google Scholar 

  30. G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, V. Christophides, Coloring RDF triples to capture provenance, in Proceedings of the 8th International Semantic Web Conference, ISWC ’09 (Springer, Berlin, 2009), pp. 196–212. doi:10.1007/978-3-642-04930-9_13

  31. H. Garcia-Molina, Database Systems: The Complete Book (Pearson Education, India, 2008)

    Google Scholar 

  32. F. Geerts, G. Karvounarakis, V. Christophides, I. Fundulaki, Algebraic structures for capturing the provenance of sparql queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13 (ACM, New York, 2013), pp. 153–164. doi:10.1145/2448496.2448516

  33. Y. Gil, S. Miles, K. Belhajjame, H. Deus, D. Garijo, G. Klyne, P. Missier, S. Soiland-Reyes, S. Zednik (eds.), in PROV model primer. W3C Working Group Note NOTE-prov-primer-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/prov-primer/

  34. B. Glavic, G. Alonso, The perm provenance management system in action, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09 (ACM, New York, NY, USA, 2009), pp. 1055–1058

    Google Scholar 

  35. A.J. Gray, Dataset descriptions for linked data systems. IEEE Internet Comput. 18(4), 66–69 (2014). doi:10.1109/MIC.2014.66

    Article  Google Scholar 

  36. T.J. Green, G. Karvounarakis, V. Tannen, Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM, 2007), pp. 31–40

    Google Scholar 

  37. P. Groth, A. Gibson, J. Velterop, The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). http://dl.acm.org/citation.cfm?id=1883685.1883690

  38. P. Groth, Y. Gil, J. Cheney, S. Miles, Requirements for provenance on the web. Int. J. Digit. Curation 7(1), 39–56 (2012). doi:10.2218/ijdc.v7i1.213

    Article  Google Scholar 

  39. P. Groth, L. Moreau (eds.), PROV-overview. An overview of the PROV family of documents, in W3C Working Group Note NOTE-Prov-Overview-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/

  40. S. Harris, N. Lamb, N. Shadbolt, 4store: the design and implementation of a clustered rdf store, in 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (2009), pp. 94–109

    Google Scholar 

  41. A. Harth, S. Decker, Optimized index structures for querying RDF from the web, in IEEE LA-WEB (2005), pp. 71–80

    Google Scholar 

  42. O. Hartig, Provenance information in the web of data, in LDOW (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf

  43. O. Hartig, Querying trust in RDF data with tSPARQL, in Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion (Springer, Berlin, 2009), pp. 5–20. doi:10.1007/978-3-642-02121-3_5

  44. P. Hayes, B. McBride, RDF semantics, in W3C Recommendation (2004)

    Google Scholar 

  45. T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool (Morgan & Claypool Publishers, 2011). doi:10.2200/S00334ED1V01Y201102WBE001

  46. T. Heath, C. Bizer, Linked data: evolving the web into a global data space. Synth. Lectures Semant. Web: Theory technol. 1(1), 1–136 (2011)

    Article  Google Scholar 

  47. J.M. Hellerstein, M. Stonebraker, Readings in Database Systems (MIT Press, Cambridge, 2005)

    Google Scholar 

  48. J. Hoffart, F.M. Suchanek, K. Berberich, G. Weikum, YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194(0), 28–61 (2013). doi:10.1016/j.artint.2012.06.001, http://www.sciencedirect.com/science/article/pii/S0004370212000719 (Artificial Intelligence, Wikipedia and Semi-Structured Resources)

  49. J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)

    Google Scholar 

  50. M. Janik, K. Kochut, BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery, in Proceedings of the The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6–10, 2005 (Springer, Berlin, 2005), pp. 431–445

    Google Scholar 

  51. G. Karvounarakis, Z.G. Ives, V. Tannen, Querying data provenance, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, 2010), pp. 951–962

    Google Scholar 

  52. G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  53. S. Miles, Electronically querying for the provenance of entities, in Provenance and Annotation of Data, vol. 4145, ed. by L. Moreau, I. Foster. Lecture Notes in Computer Science (Springer, Berlin, 2006), pp. 184–192. doi:10.1007/11890850_19

  54. M. Luc, G. Paul, Provenance: An Introduction to PROV (Morgan and Claypool, 2013). http://eprints.soton.ac.uk/356858/

  55. L. Moreau, The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010). doi:10.1561/1800000010, http://eprints.ecs.soton.ac.uk/21691/

  56. L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, J.V. den Bussche, The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011). doi:10.1016/j.future.2010.07.005, http://www.sciencedirect.com/science/article/pii/S0167739X10001275

  57. L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV. Web Semant.: Sci. Serv. Agents World Wide Web 35, Part 4, 235–257 (2015). http://dx.doi.org/10.1016/j.websem.2015.04.001, http://www.sciencedirect.com/science/article/pii/S1570826815000177

  58. T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. (PVLDB) 1(1), 647–659 (2008)

    Article  Google Scholar 

  59. T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)

    Article  Google Scholar 

  60. V. Nguyen, O. Bodenreider, A. Sheth, Don’t like RDF reification? Making statements about statements using singleton property, in Proceedings of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2014), pp. 759–770

    Google Scholar 

  61. X. Niu, R. Kapoor, B. Glavic, D. Gawlick, Z.H. Liu, V. Krishnaswamy, V. Radhakrishnan, Interoperability for provenance-aware databases using PROV and JSON, in Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance, TaPP’15 (USENIX Association, Berkeley, CA, USA, 2015), p. 6. http://dl.acm.org/citation.cfm?id=2814579.2814585

  62. A. Owens, A. Seaborne, N. Gibbins, et al., Clustered TDB: a clustered triple store for jena (2008)

    Google Scholar 

  63. E. Prud’Hommeaux, A. Seaborne, et al., Sparql query language for RDF. W3C Recommendation (2008)

    Google Scholar 

  64. S.S. Sahoo, A. Sheth, Provenir ontology: towards a framework for escience provenance management, in Microsoft eScience Workshop (2009). http://knoesis.wright.edu/library/resource.php?id=741

  65. M. Schmachtenberg, C. Bizer, H. Paulheim, Adoption of the linked data best practices in different topical domains, in The Semantic Web–ISWC 2014 (Springer, 2014), pp. 245–260

    Google Scholar 

  66. Y. Theoharis, I. Fundulaki, G. Karvounarakis, V. Christophides, On provenance of queries on semantic web data. IEEE Internet Comput. 15(1), 31–39 (2011). doi:10.1109/MIC.2010.127

    Article  Google Scholar 

  67. O. Udrea, D.R. Recupero, V. Subrahmanian, Annotated RDF. ACM Trans. Comput. Log. (TOCL) 11(2), 10 (2010)

    MathSciNet  MATH  Google Scholar 

  68. Y.R. Wang, S.E. Madnick, A polygon model for heterogeneous database systems: the source tagging perspective, in Proceedings of the Sixteenth International Conference on Very Large Databases (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990), pp. 519–533. http://dl.acm.org/citation.cfm?id=94362.94604

  69. C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. (PVLDB) 1(1), 1008–1019 (2008). http://doi.acm.org/10.1145/1453856.1453965

  70. M. Wylot, Efficient, scalable, and provenance-aware management of linked data. Ph.D. thesis, University of Fribourg (Switzerland) (2015)

    Google Scholar 

  71. M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)

    Google Scholar 

  72. M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793

    Google Scholar 

  73. M. Wylot, P. Cudre-Mauroux, P. Groth, TripleProv: efficient processing of lineage queries in a native RDF store, in Proceedings of the 23rd International Conference on World Wide Web, WWW ’14. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014), pp. 455–466

    Google Scholar 

  74. M. Wylot, P. Cudré-Mauroux, P. Groth, A demonstration of tripleprov: tracking and querying provenance over web data. Proc. VLDB Endow. 8(12), 1992–1995 (2015)

    Article  Google Scholar 

  75. M. Wylot, P. Cudré-Mauroux, P. Groth, Executing provenance-enabled queries over web data, in Proceedings of the 24rd International Conference on World Wide Web, WWW ’15. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015)

    Google Scholar 

  76. J. Zhao, Guide to the Open Provenance Model Vocabulary (2010). http://open-biomed.sourceforge.net/opmv/opmv-guide.html

  77. J. Zhao, C. Bizer, Y. Gil, P. Missier, S. Sahoo, Provenance requirements for the next version of RDF, in W3C Workshop RDF Next Steps (2010)

    Google Scholar 

  78. A. Zimmermann, N. Lopes, A. Polleres, U. Straccia, A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant. 11, 72–95 (2012). doi:10.1016/j.websem.2011.08.006

  79. L. Zou, J. Mo, L. Chen, M.T. Oezsu, D. Zhao, gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Wylot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Hauswirth, M., Wylot, M., Grund, M., Groth, P., Cudré-Mauroux, P. (2017). Linked Data Management. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49340-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49339-8

  • Online ISBN: 978-3-319-49340-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics