RDF in the clouds: a survey

Kaoudi, Zoi; Manolescu, Ioana

doi:10.1007/s00778-014-0364-z

RDF in the clouds: a survey

Regular Paper
Published: 11 July 2014

Volume 24, pages 67–91, (2015)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Zoi Kaoudi¹^nAff2 &
Ioana Manolescu¹

3473 Accesses
89 Citations
3 Altmetric
Explore all metrics

Abstract

The Resource Description Framework (RDF) pioneered by the W3C is increasingly being adopted to model data in a variety of scenarios, in particular data to be published or exchanged on the Web. Managing large volumes of RDF data is challenging, due to the sheer size, the heterogeneity, and the further complexity brought by RDF reasoning. To tackle the size challenge, distributed storage architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications for the scalability, fault-tolerance, and elasticity feature it provides, enabling the easy deployment of distributed and parallel architectures. In this article, we survey RDF data management architectures and systems designed for a cloud environment, and more generally, those large-scale RDF data management systems that can be easily deployed therein. We first give the necessary background, then describe the existing systems and proposals in this area, and classify them according to dimensions related to their capabilities and implementation techniques. The survey ends with a discussion of open problems and perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://en.wikipedia.org/wiki/Open_data
From now on, we will use the term RDF(S) to refer to both RDF and RDFS.
http://www.w3.org/TR/sparql11-property-paths/

References

Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic Web data management. VLDB J. 18(2), 385–406 (2009)
Article Google Scholar
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)
MATH Google Scholar
Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., Sun, C.: XML Processing in DHT Networks, pp. 606–615. ICDE, Cancun, Mexico (2008)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In VLDB, Lyon, France (2009)
Afrati F. N., Ullman J. D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110, Lausanne, Switzerland (2010)
Afrati, F.N., Ullman, J.D.: Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Trans. Knowl. Data Eng., 23(9), 1282–1298 (2011)
Apache Accumulo.: http://accumulo.apache.org/ (2012)
Apache Cassandra.: http://cassandra.apache.org/ (2012)
Apache Hadoop.: http://hadoop.apache.org/ (2012)
Apache HBase.: http://hbase.apache.org/ (2012)
Aranda-Andújar, A., Bugiotti, F., Camacho-Rodríguez, J., Colazzo, D., Goasdoué, F., Kaoudi, Z., Manolescu, I.: Amada: Web Data Repositories in the Amazon cloud. CIKM, pp. 2749–2751, Maui, Hawaii (2012)
Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An Empirical Study of Real-World SPARQL Queries. In: USEWOD (2011)
Amazon Web Services.: http://aws.amazon.com/ (2012)
Bal, H.E., Maassen, J., van Nieuwpoort, R.V., Drost, N., Kemp, R., Palmer, N., Wrzesinska, G., Kielmann, T., Seinstra, F., Jacobs, C.: Real-world distributed computing with Ibis. IEEE Comput. 43(8), 54–62 (2010)
Article Google Scholar
Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs PODS, pp. 1–15, Cambridge, Massachusetts, USA (1986)
Berners-Lee, T.: Linked data—design issues. http://www.w3.org/DesignIssues/LinkedData.html. (2006)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MaPreduce. In: SIGMOD Conference, pp. 975–986, Indianapolis, Indiana, USA (2010)
Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: SIGMOD, pp. 121–132, New York, USA (2013)
Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C Recommendation (2004)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: International Semantic Web Conference, pp. 54–68, Sardinia, Italy (2002)
Bugiotti, F., Camacho-Rodríguez, J., Goasdoué, F., Kaoudi, Z., Manolescu, I., Zampetakis, S.: SPARQL query processing in the cloud. In: Harth, A., Hose, K., Schenkel, R. (eds.) Linked Data Management. Chapman and Hall/CRC, Boca Raton (2014)
Google Scholar
Bugiotti, F., Goasdoué, F., Kaoudi, Z., Manolescu, I.: RDF Data Management in the Amazon Cloud. In: DanaC Workshop (in conjunction with EDBT) (2012)
Cattell, R.: Scalable SQL and NoSQL data stores. SIGMOD Record 39(4), 12–27 (May 2011)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI (2006)
Inseok Chong, E., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB (2005)
Colazzo, D., Goasdoué, F., Manolescu, I., Roatiş, A.: RDF Analytics: Lenses over Semantic Graphs. In: WWW (2014)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M.: Mapreduce online. In: NSDI (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: SOSP, pp. 205–220 (2007)
Dittrich, J., Quiane-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010)
Dittrich, J., Quiane-Ruiz, J.-A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. In: PVLDB, pp. 1591–1602 (2012)
Doulkeridis, C., Norvag, K.: A survey of large-scale analytical query processing in MapReduce. VLDB J. 23(3), 355–380 (2013)
DynamoDB.: http://aws.amazon.com/dynamodb/
Elghandour, I., Aboulnaga, A.: ReStore: reusing results of MapReduce jobs. PVLDB 5(6), 586–597 (2012)
Google Scholar
Erling, O., Mikhailov, I.: RDF Support in the Virtuoso DBMS. CSSW, pp. 59–68, Leipzig, Germany (2007)
Filali, I., Bongiovanni, F., Huet, F., Baude, F.: A Survey of Structured P2P Systems for RDF Data Storage and Retrieval. T. Large-Scale Data- and Knowledge-Centered Systems 3, 20–55 (2011)
Google Scholar
Galarraga, L., Hose, K., Schenkel, R.: Partout: A distributed engine for efficient RDF processing. Technical report: CoRR abs/1212.5636 (2012)
Goasdoué, F., Manolescu, I., Roatiş, A.: Efficient query answering against dynamic RDF databases. In: EDBT (2013)
W3C OWL Working Group. OWL 2 Web Ontology Language. W3C Recommendation, Dec 2012. http://www.w3.org/TR/rdf-mt/
Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered RDF store. In: SSWS Workshop (2009)
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation. http://www.w3.org/TR/sparql11-overview/ (2013)
Hayes, P.: RDF Semantics. W3C Recommendation. http://www.w3.org/TR/rdf-mt/ (2004)
Hose, K., Schenkel, R.: WARP: Workload-Aware Replication and Partitioning for RDF. In: DESWEB Workshop (in conjunction with ICDE), (2013)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Husain, M., Khan, L., Kantarcioglu, M., Thuraisingham, B.M.: Data intensive query processing for large RDF graphs using cloud computing tools, IEEE CLOUD, pp. 1–10 , Miami, FL (2010)
Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. Data Eng. 23(9), 1312–1327 (2011)
Lawder, J.K., King, P.J.H.: Using Space-filling curves for multi-dimensional indexing. In: British National Conference on Databases: Advances in Databases (2000)
Kaoudi, Z., Koubarakis, M.: Distributed RDFS reasoning over structured overlay networks. J. Data Semant. 2(4), 189–227 (2013)
Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: Storing, updating and querying RDF(S) data on top of DHTs. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4), (2010)
Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL query optimization on top of DHTs. In: ISWC (2010)
Kim, H., Ravindra, P., Anyanwu, K.: From SPARQL to MapReduce: The journey using a nested triplegroup algebra (demo). PVLDB 4(12), 1426–1429 (2011)
Google Scholar
Kim, H., Ravindra, P., Anyanwu, K.: Scan-sharing for optimizing RDF graph pattern matching on MapReduce. In: IEEE conference on cloud computing, pp. 139–146 (2012)
Kiryakov, A., Bishoa, B., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.: The features of BigOWLIM that Enabled the BBC’s World Cup Website. In: Workshop on Semantic Data Management (2010)
Klyne, G., Carroll, J.J.: Resource description framework (RDF): Concepts and abstract syntax. W3C Recommendation (2004)
Ladwig, G., Harth, A.: CumulusRDF: linked data management on nested key-value stores. In: SSWS (2011)
State of the LOD cloud. http://www4.wiwiss.fu-berlin.de/lodcloud/state/, (2011)
Manola, F., Miller, E.: RDF Primer. W3C Recommendation (2004)
METIS.: http://glaros.dtc.umn.edu/gkhome/views/metis
Muñoz, S., Pérez, J., Gutierrez, C.: Simple and efficient minimal RDFS. Web Semant.: Sci Services and Agents on the World Wide Web 7(3), 220–234 (2009)
Article Google Scholar
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDBJ, 19(1):91–113 (2010)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Ono, K., Lohman, G.M.: Measuring the complexity of join enumeration in query optimization. In: VLDB, pp. 314–325 (1990)
Marin Dimitrov (Ontotext).: Semantic technologies from big data. http://www.slideshare.net/marin_dimitrov/semantic-technologies-for-big-data, (2012)
Owens, A., Seaborne, A., Gibbins, N., Schraefel, M..: Clustered TDB: a clustered triple store for Jena. Technical report (2008)
Özsu, T., Valduriez, P.: Principles of distributed database systems. Springer, Berlin (2011)
Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H\(_2\)RDF: adaptive query processing on RDF data in the cloud (demo). In: WWW (2012)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 16:1–16:45 (2009)
Article Google Scholar
Punnoose, R., Crainiceanu, A., Rapp, D.: Rya: a scalable RDF triple store for the clouds. In Workshop on Cloud Intelligence (in conjunction with VLDB) (2012)
Raschia, G., Theobald, M., Manolescu, I.: Proceedings of the first International Workshop On Open Data (WOD) (2012)
Ravindra, P., Kim, H., Anyanwu, K.: An intermediate algebra for optimizing RDF graph pattern matching on MapReduce. In: ESWC, pp. 46–61 (2011)
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications (2010)
Rohloff, K., Schantz, R.E.: Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In: Workshop on Data-intensive Distributed Computing (2011)
Sakr, S., Liu, A., Fayoumi, A.G.: The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 1–11: 44 (2013)
Article Google Scholar
Saleem, M., Kamdar, M.R., Iqbal, A., Sampath, S., Deus, H.F., Ngonga, A.: Fostering Serendipity through Big Linked Data. In: Semantic Web Challenge at ISWC (2013)
Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: Mapping SPARQL to pig latin. In: SWIM (2011)
Schätzle, A., Przyjaciel-Zablocki, M., Dorner, C., Hornung, T., Lausen, G.: Cascading map-side joins over HBase for scalable join processing. In: SSWS+HPCSW (2012)
Shao, B., Wang, H., Li, Y.: The trinity graph engine. Technical report, http://research.microsoft.com/pubs/161291/trinity.pdf (2012)
Stein, R., Zacharias, V.: RDF on cloud number nine. Scalable and Dynamic. In: Workshop on New Forms of Reasoning for the Semantic Web (2010)
The Cancer Genome Atlas project.: http://cancergenome.nih.gov/
ter Horst, H.J.: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. Web Semant. 3(2–3), 79–115 (2005)
Article MathSciNet Google Scholar
Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking Database representations of RDF/S stores. In: ISWC (2005)
Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007)
Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.E.: OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In: ESWC, pp. 213–227 (2010)
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable distributed reasoning using mapreduce. In: ISWC (2009)
Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.: QueryPIE: backward reasoning for OWL horst over very large knowledge bases. In: ISWC (2011)
Wang, G., Chan, C.: Multi-query optimization in mapreduce framework. PVLDB 7(3), 145–156 (2013)
Google Scholar
Weaver, J., Hendler, J.A.: Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In: ISWC (2009)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Raynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB (in conjunction with VLDB) (2003)
Wu, B., Jin, H., Yuan, P.: Scalable SAPRQL querying processing on large RDF data in cloud computing environment. In: ICPCA/SWS, pp. 631–646 (2012)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB (2013)
Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In: ICDE (2013)
Zhang, X., Chen, L., Wang, M.: Towards efficient join processing over large RDF graph using mapreduce. In: SSDBM, pp. 250–259 (2012)

Download references

Author information

Zoi Kaoudi
Present address: Athena Research Center, IMIS, Athens, Greece

Authors and Affiliations

Inria Saclay–Île-de-France and Université Paris-Sud, Bâtiment 650 (PCRI), 91405 , Orsay Cedex, France
Zoi Kaoudi & Ioana Manolescu

Authors

Zoi Kaoudi
View author publications
You can also search for this author in PubMed Google Scholar
Ioana Manolescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoi Kaoudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaoudi, Z., Manolescu, I. RDF in the clouds: a survey. The VLDB Journal 24, 67–91 (2015). https://doi.org/10.1007/s00778-014-0364-z

Download citation

Received: 28 October 2013
Revised: 14 April 2014
Accepted: 10 June 2014
Published: 11 July 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s00778-014-0364-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RDF in the clouds: a survey

Abstract

Access this article

Similar content being viewed by others

An Exploratory Study of RDF: A Data Model for Cloud Computing

A survey of RDF data management systems

NoSQL Databases for RDF: An Empirical Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RDF in the clouds: a survey

Abstract

Access this article

Similar content being viewed by others

An Exploratory Study of RDF: A Data Model for Cloud Computing

A survey of RDF data management systems

NoSQL Databases for RDF: An Empirical Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation