Skip to main content

Non-native RDF Storage Engines

  • Chapter
  • First Online:
Handbook of Big Data Technologies

Abstract

The proliferation of heterogeneous Linked Data requires data management systems to constantly improve their scalability and efficiency. Linked Data can be stored according to many different data storage models. Some of these attempt to use general purpose database storage techniques to persist Linked Data, hence they can leverage existing data processing environments (e.g., big Hadoop clusters). We therefore look at the multiplicity of Linked Data storage systems which we categorize into the following classes: relational database-based systems, NoSQL-based systems, massively parallel systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dublincore.org/.

  2. 2.

    https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.

  3. 3.

    http://ribs.csres.utexas.edu/nosqlrdf/.

  4. 4.

    http://hbase.apache.org/.

  5. 5.

    http://hadoop.apache.org/hdfs.

  6. 6.

    http://zookeeper.apache.org/.

  7. 7.

    In HBase timestamp adds an additional dimension to each cell besides column family and column.

  8. 8.

    see e.g.,http://www.youtube.com/watch?v=byXGqhz2N5M.

  9. 9.

    http://hive.apache.org/query.

  10. 10.

    http://code.google.com/p/cumulusrdf/.

  11. 11.

    https://accumulo.apache.org/.

  12. 12.

    http://www.couchbase.com/couchbase-server/architecture.

  13. 13.

    http://www.openrdf.org/.

  14. 14.

    https://pig.apache.org/.

  15. 15.

    http://glaros.dtc.umn.edu/gkhome/views/metis.

  16. 16.

    http://dbis.informatik.uni-freiburg.de/S2RDF.

  17. 17.

    http://spark.apache.org/.

  18. 18.

    https://parquet.apache.org/.

References

  1. D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23–27, 2007 (ACM, New York, 2007), pp. 411–422

    Google Scholar 

  2. D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (2007), pp. 411–422

    Google Scholar 

  3. R. Agrawal, A. Somani, Y. Xu, Storage and querying of E-commerce data, in VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11–14, 2001, Roma, Italy (Morgan Kaufmann, Burlington, 2001), pp. 149–158

    Google Scholar 

  4. S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, On Storing voluminous RDF descriptions: the case of web portal catalogs, in WebDB (2001), pp. 43–48

    Google Scholar 

  5. A. Aranda-Andújar, F. Bugiotti, J. Camacho-Rodríguez, D. Colazzo, F. Goasdoué, Z. Kaoudi, I. Manolescu, AMADA: web data repositories in the amazon cloud, in 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012 (2012), pp. 2749–2751. doi:10.1145/2396761.2398749

  6. M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015), pp. 1383–1394. doi:10.1145/2723372.2742797

  7. C. Bizer, A. Schultz, The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009)

    Article  Google Scholar 

  8. J. Broekstra, A. Kampman, F. van Harmelen, Sesame: a generic architecture for storing and querying RDF and RDF schema, in The Semantic Web - ISWC 2002, First International Semantic Web Conference, Sardinia, Italy, June 9-12, 2002, Proceedings (Springer, Heidelberg, 2002), pp. 54–68

    Google Scholar 

  9. J. Broekstra, A. Kampman, F. Harmelen, Sesame: a generic architecture for storing and querying RDF and RDF schema, in The Semantic Web ISWC 2002, by eds. I. Horrocks, J. Hendler, Lecture Notes in Computer Science, vol. 2342 (Springer, Heidelberg, 2002), pp. 54–68. doi:10.1007/3-540-48005-6-7

  10. F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008). doi:10.1145/1365815.1365816

  11. X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in Proceedings of the ISWC 2014 Posters and Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014 (2014), pp. 261–264. http://ceur-ws.org/Vol-1272/paper_43.pdf

  12. X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Singapore, December 6-9, 2015, vol. I (2015), pp. 292–300. doi:10.1109/WI-IAT.2015.186

  13. E.I. Chong, S. Das, G. Eadon, J. Srinivasan, An efficient SQL-based RDF querying scheme, in Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005 (ACM, New York, 2005), pp. 1216–1227

    Google Scholar 

  14. G.P. Copeland, S. Khoshafian, A decomposition storage model, in Proceedings of the ACM SIGMOD International Conference on Management of Data (1985), pp. 268–279

    Google Scholar 

  15. P. Cudr–Mauroux, I. Enchev, S. Fundatureanu, P. Groth, A., Haque, A. Harth, F.L. Keppmann, D. Miranker, J. Sequeda, M. Wylot, NoSQL databases for RDF: an empirical evaluation, in International Semantic Web Conference (2013)

    Google Scholar 

  16. B. Djahandideh, F. Goasdoué, Z. Kaoudi, I. Manolescu, J. Quiané-Ruiz, S. Zampetakis, Cliquesquare in action: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015 (2015), pp. 1432–1435. doi:10.1109/ICDE.2015.7113394

  17. S. Fundatureanu, A scalable RDF store based on HBASE. Master’s thesis, Vrije University (2012). http://archive.org/details/ScalableRDFStoreOverHBase

  18. F. Goasdoué, Z. Kaoudi, I. Manolescu, J. Quiané-Ruiz, S. Zampetakis, Cliquesquare: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13–17 (2015), pp. 771–782 (2015). doi:10.1109/ICDE.2015.7113332

  19. J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6–8, 2014 (2014), pp. 599–613. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/gonzalez

  20. E.L. Goodman, D. Grunwald, Using vertex-centric programming platforms to implement SPARQL queries on large graphs, in Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’14 (IEEE Press, Piscataway, NJ, USA, 2014), pp. 25–32. doi:10.1109/IA3.2014.10

  21. A. Haque, L. Perkins, Distributed RDF triple store using HBase and Hive (2012)

    Google Scholar 

  22. S. Harris, N. Gibbins, 3store: efficient bulk RDF storage, in PSSS1 - Practical and Scalable Semantic Systems, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, Sanibel Island, Florida, USA, October 20, 2003 (CEUR-WS.org, 2003)

    Google Scholar 

  23. A. Harth, S. Decker, Optimized index structures for querying RDF from the Web, in IEEE LA-WEB (2005), pp. 71–80

    Google Scholar 

  24. J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)

    Google Scholar 

  25. H. Kim, P. Ravindra, K. Anyanwu, From sparql to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12), 1426–1429 (2011)

    Google Scholar 

  26. G. Ladwig, A. Harth, CumulusRDF: linked data management on nested key-value stores, in The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011) (2011), p. 30

    Google Scholar 

  27. A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922

  28. A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922

    Article  Google Scholar 

  29. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J.M. Hellerstein, Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012). http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf

  30. B. McBride, Jena: a semantic web toolkit. IEEE Int. Comput. 6(6), 55–59 (2002)

    Article  Google Scholar 

  31. C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2008), pp. 1099–1110

    Google Scholar 

  32. N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: high-performance distributed joins over large-scale RDF graphs, in Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013 (Santa Clara, CA, USA, 2013), pp. 255–263. doi:10.1109/BigData.2013.6691582

  33. N. Papailiou, I. Konstantinou, D. Tsoumakos, N. Koziris, H2RDF: adaptive query processing on RDF data in the cloud, in WWW (Companion Volume)

    Google Scholar 

  34. N. Papailiou, D. Tsoumakos, I. Konstantinou, P. Karras, N. Koziris, H\({}_{\text{2}}\)rdf+: an efficient data management system for big RDF graphs, in International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014 (2014), pp. 909–912. doi:10.1145/2588555.2594535

  35. R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015). doi:10.1016/j.is.2013.07.001

  36. P. Ravindra, V.V. Deshpande, K. Anyanwu, Towards scalable RDF graph analytics on mapreduce, in Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (ACM, New York, 2010), p. 5

    Google Scholar 

  37. P. Ravindra, H. Kim, K. Anyanwu, An intermediate algebra for optimizing RDF graph pattern matching on MapReduce, in The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II (Springer, Heidelberg, 2011), pp. 46–61

    Google Scholar 

  38. K. Rohloff, R.E. Schantz, Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store, in Proceedings of the Fourth International Workshop on Data-intensive Distributed Computing (ACM, New York, 2011), pp. 35–44

    Google Scholar 

  39. S. Sakr, G. Al-Naymat, Relational processing of RDF queries: a survey. SIGMOD Rec. 38(4), 23–28 (2009). doi:10.1145/1815948.1815953

  40. A. Schätzle, M. Przyjaciel-Zablocki, T. Berberich, G. Lausen, S2X: graph-parallel querying of RDF with GraphX, in 1st International Workshop on Big-Graphs Online Querying (Big-O(Q) (2015)

    Google Scholar 

  41. A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, G. Lausen, Pigsparql: A SPARQL query processing baseline for big data, in Proceedings of the ISWC 2013 Posters and Demonstrations Track, Sydney, Australia, October 23, 2013 (2013), pp. 241–244. http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf

  42. A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF querying with SPARQL on spark. CoRR (2015). http://arxiv.org/abs/1512.07021

  43. B. Shao, H. Wang, Y. Li, Trinity: a distributed graph engine on a memory cloud, in Proceedings of the 2013 International Conference on Management of Data (ACM, New York, 2013), pp. 505–516

    Google Scholar 

  44. M. Stonebraker, D.J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E.J. O’Neil, P.E. O’Neil, A. Rasin, N. Tran, S.B. Zdonik, C-Store: a column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB) (2005), pp. 553–564

    Google Scholar 

  45. P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Christophides, P. Boncz, Heuristics-based query optimisation for SPARQL, in Proceedings of the 15th International Conference on Extending Database Technology

    Google Scholar 

  46. J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F. Seinstra, F.V. Harmelen, H. Bal, Webpie: a web-scale parallel inference engine, in Third IEEE International Scalable Computing Challenge (SCALE2010), held in conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2010)

    Google Scholar 

  47. P. Valduriez, Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987). doi:10.1145/22952.22955

  48. K. Wilkinson, C. Sayers, H.A. Kuno, D. Reynolds, Efficient RDF storage and retrieval in jena2, in SWDB’03 (2003), pp. 131–150

    Google Scholar 

  49. K. Wilkinson, K. Wilkinson, Jena property table implementation, in International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS) (2006)

    Google Scholar 

  50. M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)

    Google Scholar 

  51. M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793

    Google Scholar 

  52. M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’10, Boston, MA, USA, June 22, 2010 (2010). https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets

  53. K. Zeng, J. Yang, H. Wang, B. Shao, Z. Wang, A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013). http://www.vldb.org/pvldb/vol6/p265-zeng.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Wylot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Hauwirth, M., Wylot, M., Grund, M., Sakr, S., Cudré-Mauroux, P. (2017). Non-native RDF Storage Engines. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49340-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49339-8

  • Online ISBN: 978-3-319-49340-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics