Non-native RDF Storage Engines

Hauwirth, Manfred; Wylot, Marcin; Grund, Martin; Sakr, Sherif; Cudré-Mauroux, Phillippe

doi:10.1007/978-3-319-49340-4_10

Manfred Hauwirth^3,4,
Marcin Wylot^3,4,
Martin Grund⁵,
Sherif Sakr⁶ &
…
Phillippe Cudré-Mauroux⁵

7479 Accesses

Abstract

The proliferation of heterogeneous Linked Data requires data management systems to constantly improve their scalability and efficiency. Linked Data can be stored according to many different data storage models. Some of these attempt to use general purpose database storage techniques to persist Linked Data, hence they can leverage existing data processing environments (e.g., big Hadoop clusters). We therefore look at the multiplicity of Linked Data storage systems which we categorize into the following classes: relational database-based systems, NoSQL-based systems, massively parallel systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient data management tools for the heterogeneous big data warehouse

Article 04 September 2016

Universal Storage Adaption for Distributed RDF-Triple Stores

Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

Notes

1.
http://dublincore.org/.
2.
https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.
3.
http://ribs.csres.utexas.edu/nosqlrdf/.
4.
http://hbase.apache.org/.
5.
http://hadoop.apache.org/hdfs.
6.
http://zookeeper.apache.org/.
7.
In HBase timestamp adds an additional dimension to each cell besides column family and column.
8.
see e.g.,http://www.youtube.com/watch?v=byXGqhz2N5M.
9.
http://hive.apache.org/query.
10.
http://code.google.com/p/cumulusrdf/.
11.
https://accumulo.apache.org/.
12.
http://www.couchbase.com/couchbase-server/architecture.
13.
http://www.openrdf.org/.
14.
https://pig.apache.org/.
15.
http://glaros.dtc.umn.edu/gkhome/views/metis.
16.
http://dbis.informatik.uni-freiburg.de/S2RDF.
17.
http://spark.apache.org/.
18.
https://parquet.apache.org/.

References

D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23–27, 2007 (ACM, New York, 2007), pp. 411–422
Google Scholar
D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (2007), pp. 411–422
Google Scholar
R. Agrawal, A. Somani, Y. Xu, Storage and querying of E-commerce data, in VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11–14, 2001, Roma, Italy (Morgan Kaufmann, Burlington, 2001), pp. 149–158
Google Scholar
S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, On Storing voluminous RDF descriptions: the case of web portal catalogs, in WebDB (2001), pp. 43–48
Google Scholar
A. Aranda-Andújar, F. Bugiotti, J. Camacho-Rodríguez, D. Colazzo, F. Goasdoué, Z. Kaoudi, I. Manolescu, AMADA: web data repositories in the amazon cloud, in 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012 (2012), pp. 2749–2751. doi:10.1145/2396761.2398749
M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015), pp. 1383–1394. doi:10.1145/2723372.2742797
C. Bizer, A. Schultz, The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009)
Article Google Scholar
J. Broekstra, A. Kampman, F. van Harmelen, Sesame: a generic architecture for storing and querying RDF and RDF schema, in The Semantic Web - ISWC 2002, First International Semantic Web Conference, Sardinia, Italy, June 9-12, 2002, Proceedings (Springer, Heidelberg, 2002), pp. 54–68
Google Scholar
J. Broekstra, A. Kampman, F. Harmelen, Sesame: a generic architecture for storing and querying RDF and RDF schema, in The Semantic Web ISWC 2002, by eds. I. Horrocks, J. Hendler, Lecture Notes in Computer Science, vol. 2342 (Springer, Heidelberg, 2002), pp. 54–68. doi:10.1007/3-540-48005-6-7
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008). doi:10.1145/1365815.1365816
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in Proceedings of the ISWC 2014 Posters and Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014 (2014), pp. 261–264. http://ceur-ws.org/Vol-1272/paper_43.pdf
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Singapore, December 6-9, 2015, vol. I (2015), pp. 292–300. doi:10.1109/WI-IAT.2015.186
E.I. Chong, S. Das, G. Eadon, J. Srinivasan, An efficient SQL-based RDF querying scheme, in Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005 (ACM, New York, 2005), pp. 1216–1227
Google Scholar
G.P. Copeland, S. Khoshafian, A decomposition storage model, in Proceedings of the ACM SIGMOD International Conference on Management of Data (1985), pp. 268–279
Google Scholar
P. Cudr–Mauroux, I. Enchev, S. Fundatureanu, P. Groth, A., Haque, A. Harth, F.L. Keppmann, D. Miranker, J. Sequeda, M. Wylot, NoSQL databases for RDF: an empirical evaluation, in International Semantic Web Conference (2013)
Google Scholar
B. Djahandideh, F. Goasdoué, Z. Kaoudi, I. Manolescu, J. Quiané-Ruiz, S. Zampetakis, Cliquesquare in action: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015 (2015), pp. 1432–1435. doi:10.1109/ICDE.2015.7113394
S. Fundatureanu, A scalable RDF store based on HBASE. Master’s thesis, Vrije University (2012). http://archive.org/details/ScalableRDFStoreOverHBase
F. Goasdoué, Z. Kaoudi, I. Manolescu, J. Quiané-Ruiz, S. Zampetakis, Cliquesquare: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13–17 (2015), pp. 771–782 (2015). doi:10.1109/ICDE.2015.7113332
J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6–8, 2014 (2014), pp. 599–613. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/gonzalez
E.L. Goodman, D. Grunwald, Using vertex-centric programming platforms to implement SPARQL queries on large graphs, in Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’14 (IEEE Press, Piscataway, NJ, USA, 2014), pp. 25–32. doi:10.1109/IA3.2014.10
A. Haque, L. Perkins, Distributed RDF triple store using HBase and Hive (2012)
Google Scholar
S. Harris, N. Gibbins, 3store: efficient bulk RDF storage, in PSSS1 - Practical and Scalable Semantic Systems, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, Sanibel Island, Florida, USA, October 20, 2003 (CEUR-WS.org, 2003)
Google Scholar
A. Harth, S. Decker, Optimized index structures for querying RDF from the Web, in IEEE LA-WEB (2005), pp. 71–80
Google Scholar
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
H. Kim, P. Ravindra, K. Anyanwu, From sparql to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12), 1426–1429 (2011)
Google Scholar
G. Ladwig, A. Harth, CumulusRDF: linked data management on nested key-value stores, in The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011) (2011), p. 30
Google Scholar
A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922
A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922
Article Google Scholar
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J.M. Hellerstein, Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012). http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf
B. McBride, Jena: a semantic web toolkit. IEEE Int. Comput. 6(6), 55–59 (2002)
Article Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2008), pp. 1099–1110
Google Scholar
N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: high-performance distributed joins over large-scale RDF graphs, in Proceedings of the 2013 IEEE International Conference on Big Data, 6-9 October 2013 (Santa Clara, CA, USA, 2013), pp. 255–263. doi:10.1109/BigData.2013.6691582
N. Papailiou, I. Konstantinou, D. Tsoumakos, N. Koziris, H2RDF: adaptive query processing on RDF data in the cloud, in WWW (Companion Volume)
Google Scholar
N. Papailiou, D. Tsoumakos, I. Konstantinou, P. Karras, N. Koziris, H${}_{\text{2}}$rdf+: an efficient data management system for big RDF graphs, in International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014 (2014), pp. 909–912. doi:10.1145/2588555.2594535
R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015). doi:10.1016/j.is.2013.07.001
P. Ravindra, V.V. Deshpande, K. Anyanwu, Towards scalable RDF graph analytics on mapreduce, in Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (ACM, New York, 2010), p. 5
Google Scholar
P. Ravindra, H. Kim, K. Anyanwu, An intermediate algebra for optimizing RDF graph pattern matching on MapReduce, in The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II (Springer, Heidelberg, 2011), pp. 46–61
Google Scholar
K. Rohloff, R.E. Schantz, Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store, in Proceedings of the Fourth International Workshop on Data-intensive Distributed Computing (ACM, New York, 2011), pp. 35–44
Google Scholar
S. Sakr, G. Al-Naymat, Relational processing of RDF queries: a survey. SIGMOD Rec. 38(4), 23–28 (2009). doi:10.1145/1815948.1815953
A. Schätzle, M. Przyjaciel-Zablocki, T. Berberich, G. Lausen, S2X: graph-parallel querying of RDF with GraphX, in 1st International Workshop on Big-Graphs Online Querying (Big-O(Q) (2015)
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, G. Lausen, Pigsparql: A SPARQL query processing baseline for big data, in Proceedings of the ISWC 2013 Posters and Demonstrations Track, Sydney, Australia, October 23, 2013 (2013), pp. 241–244. http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf
A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF querying with SPARQL on spark. CoRR (2015). http://arxiv.org/abs/1512.07021
B. Shao, H. Wang, Y. Li, Trinity: a distributed graph engine on a memory cloud, in Proceedings of the 2013 International Conference on Management of Data (ACM, New York, 2013), pp. 505–516
Google Scholar
M. Stonebraker, D.J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E.J. O’Neil, P.E. O’Neil, A. Rasin, N. Tran, S.B. Zdonik, C-Store: a column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB) (2005), pp. 553–564
Google Scholar
P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Christophides, P. Boncz, Heuristics-based query optimisation for SPARQL, in Proceedings of the 15th International Conference on Extending Database Technology
Google Scholar
J. Urbani, S. Kotoulas, J. Maassen, N. Drost, F. Seinstra, F.V. Harmelen, H. Bal, Webpie: a web-scale parallel inference engine, in Third IEEE International Scalable Computing Challenge (SCALE2010), held in conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2010)
Google Scholar
P. Valduriez, Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987). doi:10.1145/22952.22955
K. Wilkinson, C. Sayers, H.A. Kuno, D. Reynolds, Efficient RDF storage and retrieval in jena2, in SWDB’03 (2003), pp. 131–150
Google Scholar
K. Wilkinson, K. Wilkinson, Jena property table implementation, in International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS) (2006)
Google Scholar
M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)
Google Scholar
M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793
Google Scholar
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’10, Boston, MA, USA, June 22, 2010 (2010). https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets
K. Zeng, J. Yang, H. Wang, B. Shao, Z. Wang, A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013). http://www.vldb.org/pvldb/vol6/p265-zeng.pdf

Download references

Author information

Authors and Affiliations

Open Distributed Systems, TU Berlin, Berlin, Germany
Manfred Hauwirth & Marcin Wylot
Open Distributed Systems, Fraunhofer FOKUS, Berlin, Germany
Manfred Hauwirth & Marcin Wylot
eXascale Infolab, University of Fribourg, Fribourg, Switzerland
Martin Grund & Phillippe Cudré-Mauroux
University of New South Wales, Kensington, Australia
Sherif Sakr

Authors

Manfred Hauwirth
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Wylot
View author publications
You can also search for this author in PubMed Google Scholar
Martin Grund
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar
Phillippe Cudré-Mauroux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Wylot .

Editor information

Editors and Affiliations

School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya
The School of Computer Science, The University of New South Wales, Eveleigh, New South Wales, Australia
Sherif Sakr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hauwirth, M., Wylot, M., Grund, M., Sakr, S., Cudré-Mauroux, P. (2017). Non-native RDF Storage Engines. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-49340-4_10
Published: 26 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Non-native RDF Storage Engines

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient data management tools for the heterogeneous big data warehouse

Universal Storage Adaption for Distributed RDF-Triple Stores

Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Non-native RDF Storage Engines

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient data management tools for the heterogeneous big data warehouse

Universal Storage Adaption for Distributed RDF-Triple Stores

Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation