research-article

Relational schemata for distributed SPARQL query processing

Authors:

Victor Anthony Arrascue Ayala,

Michael Färber,

Patrick Philipp,

Guilherme Schievelbein,

Georg LausenAuthors Info & Claims

SBD '19: Proceedings of the International Workshop on Semantic Big Data

Article No.: 3, Pages 1 - 6

https://doi.org/10.1145/3323878.3325804

Published: 05 July 2019 Publication History

Abstract

To benefit from mature database technology RDF stores are built on top of relational databases and SPARQL queries are mapped into SQL. Using a shared-nothing computer cluster is a way to achieve scalability by carrying out query processing on top of large RDF datasets in a distributed fashion. Aiming to this the current paper elaborates on the impact of relational schema design when queries are mapped into Apache Spark SQL. A single triple table, a set of tables resulting from partitioning by predicate, a single wide table covering all properties, and a set of tables based on the application model specification called domain-dependent-schema, are the considered designs. For each of the mentioned approaches, the rows of the corresponding tables are stored in the distributed file system HDFS using the columnar-store Parquet. Experiments using standard benchmarks demonstrate that the single wide property table approach, despite its simplicity, is superior to other approaches. Further experiments demonstrate that this single table approach continues to be attractive even when repartitioning by key (RDF subject) is applied before executing queries.

References

[1]

D. J. Abadi et al. Scalable semantic web data management using vertical partitioning. In Proc. VLDB, 2007.

Digital Library

[2]

I. Abdelaziz et al. Combining vertex-centric graph processing with sparql for large-scale rdf data analytics. IEEE TPDS, 2017.

[3]

A. Abele et al. Linking open data cloud diagram 2017. http://lod-cloud.net/, 2017.

[4]

G. Aluç. et al. Diversified stress testing of rdf data management systems. In Proc. ISWC, 2014.

Digital Library

[5]

P. A. Boncz et al. Advances in large-scale RDF data management. In Proc. Linked Open Data - Creating Knowledge Out of Interlinked Data - Results of the LOD2 Project. 2014.

[6]

M. A. Bornea et al. Building an efficient RDF store over a relational database. In Proc. SIGMOD, 2013.

Digital Library

[7]

J. Broekstra et al. Sesame: A generic architecture for storing and querying rdf and rdf schema. In Proc. ISWC, 2002.

Digital Library

[8]

M. Cossu et al. Prost: Distributed execution of sparql queries using mixed partitioning strategies. In Proc. EDBT, 2018.

[9]

M. Färber et al. Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web Journal, 2018.

[10]

D. Graux et al. Sparqlgx: Efficient distributed evaluation of sparql with apache spark. In Proc. ISWC, 2016.

[11]

S. Gurajada et al. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In Proc. SIGMOD, 2014.

Digital Library

[12]

A. Harth et al. Yars2: A federated repository for querying graph structured data from the web. In The Semantic Web. 2007.

Digital Library

[13]

Z. Kaoudi and I. Manolescu. RDF in the clouds: a survey. VLDB J., 24(1), 2015.

Digital Library

[14]

A. Madkour et al. Sparti: Scalable rdf data management using query-centric semantic partitioning. In Proc. SBD, 2018.

Digital Library

[15]

A. Madkour et al. WORQ: workload-driven RDF query processing. In Proc. ISWC, 2018.

Digital Library

[16]

T. Neumann et al. Rdf-3x: a risc-style engine for rdf. Proc. VLDB, 2008.

Digital Library

[17]

T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for rdf queries with multiple joins. In Proc. ICDE, 2011.

Digital Library

[18]

Z. Pan and J. Heflin. DLDB: Extending relational databases to support semantic web queries. In Proc. PSSS1 - Practical and Scalable Semantic Systems, 2003.

[19]

M. Pham and P. A. Boncz. Exploiting emergent schemas to make RDF systems more efficient. In Proc. ISWC, 2016.

[20]

M.-D. Pham et al. Deriving an emergent relational schema from rdf data. In Proc. WWW, 2015.

Digital Library

[21]

A. Potter et al. Distributed RDF query answering with dynamic data exchange. In Proc. of ISWC, 2016.

[22]

R. Punnoose et al. Rya: a scalable rdf triple store for the clouds. In Proc. IWCI, 2012.

Digital Library

[23]

T. Rebele et al. YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames.

[24]

A. Schätzle et al. Sempala: interactive sparql query processing on hadoop. In Proc. ISWC, 2014.

Digital Library

[25]

A. Schätzle et al. S2rdf: Rdf querying with sparql on spark. Proc. VLDB, 2016.

Digital Library

[26]

L. Sidirourgos et al. Column-store support for rdf data management: Not all swans are white. Proc. VLDB Endow., 2008.

Digital Library

[27]

K. Wilkinson. Jena property table implementation. In Proc. SSWKBS, 2006.

[28]

M. Wylot et al. RDF data storage and query processing schemes: A survey. ACM Comput. Surv., 51(4), 2018.

Digital Library

Cited By

Yuan QYuan YWen ZWang HTang SChen HDuh WHuang HKato MMothe JPoblete B(2023)An Effective Framework for Enhancing Query Answering in a Heterogeneous Data LakeProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591637(770-780)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591637
Bilidas DIoannidis TMamoulis NKoubarakis M(2022)Strabo 2: Distributed Management of Massive Geospatial RDF DatasetsThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_24(411-427)Online publication date: 16-Oct-2022
https://doi.org/10.1007/978-3-031-19433-7_24
Ragab M(2022)Towards Prescriptive Analyses of Querying Large Knowledge GraphsNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_59(639-647)Online publication date: 29-Aug-2022
https://doi.org/10.1007/978-3-031-15743-1_59
Show More Cited By

Index Terms

Relational schemata for distributed SPARQL query processing
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Relational parallel and distributed DBMSs
  2. World Wide Web
    1. Web data description languages
      1. Semantic web description languages
        Resource Description Framework (RDF)

Recommendations

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL
Information Integration and Web Intelligence
Abstract
In the age of big data, the volume of RDF data has been exploding due to the growing demands for open data, including Linked Open Data (LOD), semantic data processing, and knowledge graphs. Large-scale RDF data may contain millions to hundreds of ...
Semantics preserving SPARQL-to-SQL translation

Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be ...
SPARQL-to-SQL Query Translation: Bottom-Up or Top-Down?
SCC '11: Proceedings of the 2011 IEEE International Conference on Services Computing

Emerging Semantic Web Services rely on the availability of metadata that describes various functional and non-functional characteristics of computational resources. A number of semantic vocabularies and datasets describing existing services and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SBD '19: Proceedings of the International Workshop on Semantic Big Data

July 2019

57 pages

ISBN:9781450367660

DOI:10.1145/3323878

Editors:
Sven Groppe
University of Lübeck, Germany
,
Le Gruenwald
University of Oklahoma

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '19

Sponsor:

SIGMOD

SIGMOD/PODS '19: International Conference on Management of Data

July 5, 2019

Amsterdam, Netherlands

Acceptance Rates

SBD '19 Paper Acceptance Rate 8 of 15 submissions, 53%;

Overall Acceptance Rate 30 of 54 submissions, 56%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yuan QYuan YWen ZWang HTang SChen HDuh WHuang HKato MMothe JPoblete B(2023)An Effective Framework for Enhancing Query Answering in a Heterogeneous Data LakeProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591637(770-780)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591637
Bilidas DIoannidis TMamoulis NKoubarakis M(2022)Strabo 2: Distributed Management of Massive Geospatial RDF DatasetsThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_24(411-427)Online publication date: 16-Oct-2022
https://doi.org/10.1007/978-3-031-19433-7_24
Ragab M(2022)Towards Prescriptive Analyses of Querying Large Knowledge GraphsNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_59(639-647)Online publication date: 29-Aug-2022
https://doi.org/10.1007/978-3-031-15743-1_59
Ragab MTommasini REyvazov SSakr S(2020)Towards making sense of Spark-SQL performance for processing vast distributed RDF datasetsProceedings of The International Workshop on Semantic Big Data10.1145/3391274.3393632(1-6)Online publication date: 14-Jun-2020
https://dl.acm.org/doi/10.1145/3391274.3393632
Pilven MScherzinger Sd’Orazio L(2019)On Complex Value Relations in HiveAdvances in Conceptual Modeling10.1007/978-3-030-34146-6_13(146-156)Online publication date: 27-Oct-2019
https://doi.org/10.1007/978-3-030-34146-6_13

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten