Some Thoughts on OWL-Empowered SPARQL Query Optimization

Papakonstantinou, Vassilis; Flouris, Giorgos; Fundulaki, Irini; Gubichev, Andrey

doi:10.1007/978-3-319-47602-5_3

Vassilis Papakonstantinou¹⁹,
Giorgos Flouris¹⁹,
Irini Fundulaki¹⁹ &
…
Andrey Gubichev²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9989))

Included in the following conference series:

European Semantic Web Conference

1492 Accesses
1 Citations

Abstract

The discovery of optimal or close to optimal query plans for SPARQL queries is a difficult and challenging problem for query optimisers of RDF engines. Despite the growing volume of work on optimising SPARQL query answering, using heuristics or data statistics (such as cardinality estimations) there is little effort on the use of OWL constructs for query optimisation. OWL axioms can be the basis for the development of schema-aware optimisation techniques that will allow significant improvements in the performance of RDF query engines when used in tandem with data statistics or other heuristics. The aim of this paper is to show the potential of this idea, by discussing a diverse set of cases that depict how schema information can assist SPARQL query optimisers.

You have full access to this open access chapter, Download conference paper PDF

Approaches for Efficient Query Optimization Using Semantic Web Technologies

ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental Evaluation

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Linked Data paradigm, which is now the prominent enabler for sharing huge volumes of data using Semantic Web technologies, has created novel challenges for non-relational data management technologies such as RDF and graph database systems. Semantics of Linked Data are expressed in terms of the RDF Schema Language (RDFS) and the OWL Web Ontology Language. RDFS and OWL vocabularies are used from nearly all data sources in the LOD cloud. Moreover, according to a recent study^{Footnote 1}, 36.49 % of LOD use various OWL fragments, so it becomes critical to optimize RDF engines by considering OWL features.

Commercial RDF engines implement RDFS and OWL rules by performing forward or backward reasoning. Regardless of the way of reasoning, they basically store RDF data in a large triple table and consequently the evaluation of SPARQL queries boils down to performing a query with a large number of costly self-joins. To evaluate such difficult SPARQL queries a number of prototypes have been proposed. Many of these approaches propose a mapping of regular schema-conforming part of the RDF dataset into a set of relational tables [1, 2, 4], and rely on the optimization techniques of the underlying DBMSs for query evaluation. Other approaches [6, 7, 9, 13] propose main-memory resident extensive indexes for RDF triples. In either case, information residing in OWL schemas is rarely taken into account as in [3, 5, 8, 11], so it is our belief that an an OWL schema-aware SPARQL query optimizer could complement those approaches since many datasets (especially in the LOD Cloud) come with such good quality schemas.

In this paper we discuss how schema information expressed in terms of OWL ontologies can be used to perform interesting, possibly complex, optimizations in order to improve SPARQL query execution plans, and, consequently, the performance of the RDF engines. Such optimizations can be employed in a complementary fashion to traditional ones to further improve query planners’ performance. Our intention in this work is not to provide full solutions, but to present the potential of the idea (fully described in [10]) by discussing some possible types of optimizations (Sect. 2) that can be performed. Many more may exist.

2 Schema Based Optimization Techniques

2.1 Constraint Violation

An RDF engine could be able at compile time to take advantage of class and property constraints as expressed in an OWL schema; these include equivalence (owl:equivalentClass, owl:equivalentProperty) and disjointness (owl:disjointWith, owl:propertyDisjointWith) of classes and properties as well as constraints relevant to the property’s domain and range (\({\mathtt {rdfs{:}domain}}\) and \({\mathtt {rdfs{:}range}}\) resp.). For instance, a query looking for an instance of two disjoint classes (owl:disjointWith construct) is certain to return no answers, so it should be answerable in constant time, without having the query engine evaluate it. This kind of information is important for RDF engines that follow either a forward or backward reasoning approach for computing the inferred knowledge.

2.2 Selectivity Estimation

Cardinality Constraints: OWL allows defining cardinality restrictions through the min (owl:minCardinality), max (owl:maxCardinality) and exact (owl:cardinality) cardinality constraints for object and datatype properties, which state how many instances of said property a resource can have. These schema-level constraints can be used to guide the optimizer into selecting a possibly efficient join ordering without resorting to statistics [3, 5]. To do so, triple patterns that refer to more selective properties (e.g. functional properties, owl:FunctionalProperty could be pushed down in the plan to reduce intermediate results).

Complex Class Expressions: Selectivity of triple patterns in a SPARQL query can be estimated through OWL constructs that define classes through set operations, such as intersection (\(\mathtt {owl\!\!:\!\!intersectionOf}\)) and union (\(\mathtt {owl\!\!:\!\!unionOf}\)). For example, consider a query that requests instances \(\mathtt {?x}\) of a class \(\mathtt {<\!\!C\!\!>}\), the latter defined as an intersection of classes \(\mathtt {<\!\!C1\!\!>}\) and \(\mathtt {<\!\!C2\!\!>}\), in conjunction with triple patterns that relate instances \(\mathtt {?y}\) and \(\mathtt {?z}\) of the intersected classes, with triple patterns with predicates \(\mathtt {<\!\!P1\!\!>}\) and \(\mathtt {<\!\!P2\!\!>}\). The class \(\mathtt {<\!\!C\!\!>}\), being more selective, should be considered first in a bushy plan with two sub-trees (around \(\mathtt {?x}\) and \(\mathtt {?y}\), respectively) being joined with a hash join (right side of Fig. 1). Without the knowledge of schema constraints, the query optimizer would put the three triple patterns with \(\mathtt {rdf\!\!:\!\!type}\) predicate at the end, since those usually match a lot of triples (left side of Fig. 1) [12]. An analogous line of thought can be followed for the \(\mathtt {owl\!\!:\!\!unionOf}\) construct.

Class and Property Hierarchies: Hierarchies of classes and properties (through rdfs:subClassOf and rdfs:subPropertyOf) can also improve selectivity estimation. In this case, the triple patterns that request for instances of classes found lower in a class hierarchy should be considered first in a query plan (depending on the form of the query), when deciding join ordering.

2.3 Advanced Optimizations

In this section we present a set of cases where schema information can help the query engine determine the optimal plan in a more sophisticated way.

Inference: In backward reasoning systems, the inferred knowledge obtained through OWL reasoning rules is computed at query time. Is some cases, the same information may be obtained in various ways. For example, assume that we have a long hierarchy where \(\mathtt {<\!\!B_{i}\!\!>}\) is a subclass (rdfs:subClassOf) of \(\mathtt {<\!\!B_{i+1}\!\!>}\), \(i=1,\ldots , n\). Consider also that the domain (\({\mathtt {rdfs{:}domain}}\)) of property \(\mathtt {<\!\!P\!\!>}\) is class \(\mathtt {<\!\!A\!\!>}\) and all its values (\(\mathtt {owl\!\!:\!allValuesFrom}\)) come from root class \(\mathtt {<\!\!B_{n+1}\!\!>}\). In a query that asks for instances \(\mathtt {?v}\) of class \(\mathtt {<\!\!B_{n+1}\!\!>}\) that are also values of property \(\mathtt {<\!\!P\!\!>}\), there are two ways to obtain instances \(\mathtt {?v}\): one through the \(\mathtt {owl\!\!:\!allValuesFrom}\) (cls-avf axiom^{Footnote 2}), and another through the transitivity of rdfs:subClassOf (cax-sco axiom\(^{4}\)). For large n, class \(\mathtt {<\!\!B_{n+1}\!\!>}\) is positioned high in the hierarchy, so the engine should use the \(\mathtt {owl\!\!:\!allValuesFrom}\) construct to obtain the values for \(\mathtt {?v}\). The alternative may be better if the two classes are sufficiently “close” in the hierarchy, especially given the fact that subsumption-related inference is the most optimized type (due to its widespread use).

Star Query Transformation: Schema information can also be used by the query optimizer to rewrite SPARQL queries to equivalent ones that have a form for which already known optimization techniques are easily applicable. For example, when a triple pattern, involving a symmetric property (owl:SymmetricProperty), “breaks” a star-shaped query pattern (subject values of remaining triple patterns appear as an object value), a schema-aware optimizer, should rewrite this query into its equivalent one, where all triple pattern’s subject values are the same, according to the semantics of owl:SymmetricProperty.

3 Conclusions

We advocated on the use of OWL schema information for improving SPARQL query planning, and described some optimizations that can be employed in this direction. Our proposal is meant to be complementary to well-known optimizations (e.g., based on statistics) for query planning, and is most appropriate for datasets and benchmarks that use a rich schema structure (e.g., UOBM). In the future, we plan to work further on understanding the different possible optimizations and potential trade-offs, so that they can be implemented on top of an RDF store in order to quantify the achieved speed-up.

Notes

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDBJ 18(2), 385–406 (2009)
Article Google Scholar
Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: SIGMOD, pp. 121–132. ACM (2013)
Google Scholar
Bursztyn, D., GoasdouT, F., Manolescu, I.: Optimizing reformulation-based query answering in RDF. In: EDBT (2015)
Google Scholar
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB (2005)
Google Scholar
Bursztyn, D., Frantois GoasdouT, I.M.: Efficient query answering in DL-Lite through FOL reformulation. In: DL (2015)
Google Scholar
Erling, O., Mikhailov, I.: RDF support in the virtuoso DBMS. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge-Networked Media. SCI, pp. 7–24. Springer, Heidelberg (2009)
Chapter Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies. JAIR 48, 253–303 (2013)
MathSciNet MATH Google Scholar
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDBJ 19(1), 91–113 (2010)
Article Google Scholar
Papakonstantinou, V., Fundulaki, I., Flouris, G., Alexiev, V.: Benchmark design for reasoning. Technical report D4.4.2, LDBC Council (2014)
Google Scholar
Rodriguez-Muro, M., Rezk, M.: Efficient SPARQL-to-SQL with R2RML mappings. Web Semant. Sci. Serv. Agents World Wide Web 33, 141–169 (2015). Elsevier
Article Google Scholar
Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.: Heuristics-based query optimisation for SPARQL. In: EDBT (2012)
Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for Semantic Web data management. PVLDB 1(1), 1008–1019 (2008)
Google Scholar

Download references

Acknowledgements

This work was partially funded by the EU projects LDBC (FP7 GA No. 317548) and HOBBIT (H2020 GA No. 688227).

Author information

Authors and Affiliations

Institute of Computer Science-FORTH, Heraklion, Greece
Vassilis Papakonstantinou, Giorgos Flouris & Irini Fundulaki
TU Munich, Munich, Germany
Andrey Gubichev

Authors

Vassilis Papakonstantinou
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Flouris
View author publications
You can also search for this author in PubMed Google Scholar
Irini Fundulaki
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Gubichev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vassilis Papakonstantinou .

Editor information

Editors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik, Universität Potsdam, Potsdam, Germany
Harald Sack
Innovation Development, Istituto Superiore Mario Boella, Turin, Italy
Giuseppe Rizzo
Technical University of Ilmenau, Ilemnau, Germany
Nadine Steinmetz
Artiﬁcial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Institut für Informatik III, University of Bonn, Bonn, Germany
Sören Auer
Institut für Informatik III, Universität Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papakonstantinou, V., Flouris, G., Fundulaki, I., Gubichev, A. (2016). Some Thoughts on OWL-Empowered SPARQL Query Optimization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-47602-5_3
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Some Thoughts on OWL-Empowered SPARQL Query Optimization

Abstract

Similar content being viewed by others

Approaches for Efficient Query Optimization Using Semantic Web Technologies

ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental Evaluation

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Keywords

1 Introduction

2 Schema Based Optimization Techniques

2.1 Constraint Violation

2.2 Selectivity Estimation

2.3 Advanced Optimizations

3 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Some Thoughts on OWL-Empowered SPARQL Query Optimization

Abstract

Similar content being viewed by others

Approaches for Efficient Query Optimization Using Semantic Web Technologies

ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental Evaluation

Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

Keywords

1 Introduction

2 Schema Based Optimization Techniques

2.1 Constraint Violation

2.2 Selectivity Estimation

2.3 Advanced Optimizations

3 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation