Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Conjunctive query (CQ) answering over expressive Description Logic (DL) ontologies is a key reasoning task which remains unsolved for many practical purposes. Indeed, answering CQs over DL ontologies is quite intricate and often of high computational complexity [4, 8, 16]. Nevertheless, CQ answering over a major class of DLs, the so-called Horn DLs, can in some cases be addressed via application of the chase algorithm, a technique where all relevant consequences of an ontology are precomputed, allowing queries to be directly evaluated over the materialized set of facts. However, the chase is not guaranteed to terminate for all ontologies, and checking whether it does is not a straightforward procedure. It is thus an ongoing research endeavor to establish so-called acyclicity conditions; i.e., sufficient conditions which ensure termination of the chase.

The main contribution of this paper is the definition of restricted chase acyclicity (\(\text {RCA}_{n}\)), a novel acyclicity condition for Horn- \(\mathcal {SRIQ}\) ontologies (the DL Horn- \(\mathcal {SRIQ}\) may be informally described as the logic underpinning the deterministic fragment of OWL DL [9] minus nominals). If an ontology is proven to be \(\text {RCA}_{n}\), then n-cyclic terms do not occur during the computation of the chase of such ontology and thus the chase is guaranteed to terminate.

In contrast with existing acyclicity notions [6] which deal with termination of the unrestricted, i.e. oblivious, chase of arbitrary sets of existential rules, we restrict our attention to the language Horn- \(\mathcal {SRIQ}\) and seek to achieve termination of the restricted chase algorithm [3]; this is a special variant of the standard chase in which the inclusion of further terms to satisfy existential restrictions is avoided if such restrictions are already satisfied, and equality is dealt with via renaming. By considering such a chase algorithm we are able to devise acyclicity conditions which are more general than any other of the notions previously described.

On the theoretical side, we show that \(\text {RCA}_{n}\) is more general than model-faithful acyclicity (MFA) provided n is sufficiently large (linear in the size of ontology). As shown in [6], this is one of the most general acyclicity conditions for ontologies described to date, as it encompasses many other existing notions such as joint acyclicity [12], super-weak acyclicity [14] or the hybrid acyclicity notions presented in [2]. Furthermore, we show that deciding \(\text {RCA}_{n}\) membership is not harder than deciding MFA membership.

On the practical side, we empirically show that (i) \(\text {RCA}_{n}\) characterizes more real-world ontologies as acyclic than MFA. Furthermore, we demonstrate that (ii) the specific type of acyclicity captured by \(\text {RCA}_{n}\) results in a more efficient reasoning procedure. This is because acyclicity is still preserved in the case when employing renaming techniques when reasoning in the presence of equality. Thus, the use of cumbersome axiomatizations of equality such as singularization [14] can be avoided. Moreover, we report on an implementation of the restricted chase algorithm based on the datalog engine RDFOx [15] and show that (iii) it vastly outperforms state-of-the-art DL reasoners. To verify (i–iii), we complete an extensive evaluation with very encouraging results.

The rest of the paper is structured as follows: We start with some preliminaries in Sect. 2. Section 3 formally introduces the notions of oblivious and restricted chase, followed by an overview of MFA in Sect. 4. In Sect. 5 we introduce our new acyclicity notion \(\text {RCA}_{n}\). Finally, Sects. 6 and 7 describe the evaluation of our work and list our conclusions, respectively.

An extended technical report for this paper with all the proofs and further information concerning the evaluation can be found at http://dase.cs.wright.edu/publications/acyclicity-notion-cqa-over-horn-sriq-ontologies.

2 Preliminaries

Rules. We use the standard notions of constants, function symbols and predicates, where \(\approx \) is the equality predicate, \(\top \) is universal truth, and \(\bot \) is universal falsehood. Variables, terms, atoms and substitutions are defined as usual. A fact is a ground atom; i.e., an atom without occurrences of variables. As customary, every term t is associated with some depth \(dep (t) \ge 0\). Furthermore, we often abbreviate a vector of terms \(t_1, \ldots , t_n\) as \({\varvec{t}}\) and identify \({\varvec{t}}\) with the set \(\{t_1, \ldots , t_n\}\). In a similar manner, we often identify a conjunction of atoms \({\phi }_1 \wedge \ldots \wedge {\phi }_n\) with the set \(\{{\phi }_1, \ldots , {\phi }_n\}\). With \({\phi }({\varvec{x}})\) we stress that \({\varvec{x}}= x_1, \ldots , x_n\) are the free variables occurring in the formula \(\phi \).

Let t be some ground term and c some constant. Let \(t_c\) be the term obtained from t by replacing every occurrence of a constant by c, i.e., \(f(d, g(e))_c = f(c, g(c))\). The notation is analogously extended to facts and sets of facts.

A term \(t'\) is a subterm of another term t if and only if \(t' = t\), or \(t = f({\varvec{s}})\) and \(t'\) is a subterm of some \(s \in {\varvec{s}}\); if additionally \(t' \ne t\), then \(t'\) is a proper subterm of t. A term t is n-cyclic if and only if there exists a sequence of terms of the form \(f(\varvec{s_1}), \ldots , f(\varvec{s_{n+1}})\) such that \(f(\varvec{s_{n+1}})\) is a subterm of t and, for every \(i = 1, \ldots , n\), \(f(\varvec{s_i})\) is a proper subterm of \(f(\varvec{s_{i+1}})\). We simply refer to 1-cyclic terms as cyclic.

A rule is a first-order logic (FOL) formula of one of the forms

$$\begin{aligned} \forall {\varvec{x}}\forall \varvec{z} [\beta ({\varvec{x}}, \varvec{z})&\rightarrow \exists \varvec{y} \eta ({\varvec{x}}, \varvec{y})] \qquad \text {or} \end{aligned}$$
(1)
$$\begin{aligned} \forall {\varvec{x}}[\beta ({\varvec{x}})&\rightarrow x \approx y] , \end{aligned}$$
(2)

where \(\beta \) and \(\eta \) are non-empty conjunctions of atoms which do not contain occurrences of constants, function symbols nor of the predicate \(\approx \); \(\varvec{x}\), \(\varvec{y}\) and \(\varvec{z}\) are pairwise disjoint; and \(x, y \in {\varvec{x}}\). To simplify the notation, we frequently omit the universal quantifiers from rules. As customary, we refer to rules of the forms (1) and (2) as tuple generating dependencies (TGDs) and equality generating dependencies (EGDs), respectively.

Given a set of rules \(\mathcal {R}\), we define \(\mathcal {R} ^\exists \) and \(\mathcal {R} ^\forall \) as the sets of all the TGDs in \(\mathcal {R}\) which do and do not contain existentially quantified variables, respectively. Moreover, let \(\mathcal {R} ^\approx \) be the set of all EGDs in \(\mathcal {R}\). A program is a tuple \(\langle \mathcal {R},\mathcal {I} \rangle \) where \(\mathcal {R}\) is a set of rules and \(\mathcal {I}\) is an instance; i.e., a finite set of equality-free facts.

The main reasoning task we are investigating in this paper is CQ answering. Nevertheless, for the rest of the paper, we restrict our attention to the simpler task of CQ entailment of boolean conjunctive queries (BCQs). This is without loss of generality since CQ answering can be reduced to checking entailment of BCQs. A BCQ, or simply a query, is a formula of the form \(\exists \varvec{y} \eta (\varvec{y})\) where \(\eta \) is a conjunction of atoms not containing occurrences of constants, function symbols nor \(\approx \).

For the remainder of the paper, we assume that \(\top \) and \(\bot \) are treated as ordinary unary predicates and that the semantics of \(\top \) is captured explicitly in any program \({\mathcal {P} = \langle \mathcal {R},\mathcal {I} \rangle } \) by including the rule \(p(x_1, \ldots , x_n) \rightarrow \top (x_1) \wedge \ldots \wedge \top (x_n)\) in \(\mathcal {R} \) for every predicate p with arity n occurring in \(\mathcal {P} \).

We interpret programs under standard FOL semantics with true equality. As usual, a program \(\mathcal {P} \) is satisfiable if and only if \(\mathcal {P} \not \models \exists y\bot (y)\). Furthermore, given some query \(\gamma \), we write \(\mathcal {P} \models \gamma \) to indicate that \(\mathcal {P}\) entails \(\gamma \).

We will later employ skolemization to define the consequences of a TGD over a set of facts. The skolemization \(sk(\rho )\) of some TGD \(\rho = \beta ({\varvec{x}}, \varvec{z}) \rightarrow \exists \varvec{y} \eta ({\varvec{x}}, \varvec{y})\) is the rule \(\beta ({\varvec{x}}, \varvec{z}) \rightarrow \eta ({\varvec{x}}, \varvec{y})\sigma _{sk}\) where \(\sigma _{sk}\) is a substitution mapping every \(y \in \varvec{y} \) into \(f_{\rho }^{y}({\varvec{x}})\) where \(f_{\rho }^{y}\) is a fresh function unique for every variable y and TGD \(\rho \).

Description Logics. We next define the syntax and semantics of the ontology language Horn- \(\mathcal {SRIQ}\) [13]. We assume basic familiarity with DL, and refer the reader to the literature for further details [1]. Without loss of generality, we restrict our attention to ontologies in a normal form close to the one from [13].

A DL signature is a tuple \(\langle {{N}_{C}}, {N}_{R}, {{N}_{I}}\rangle \) where \({{N}_{C}}\), \({N}_{R} \) and \({{N}_{I}}\) are infinite countable and mutually disjoint sets of concept names, role names and individuals, respectively, such that \(\{\bot , \top \} \subseteq {{N}_{C}}\). A role is an element of \({N}^-_{R} = {N}_{R} \cup \{R^- \mid R \in {N}_{R} \}\). A TBox axiom is a formula of one of the forms given on the left hand side of the mappings in Fig. 1. TBox axioms of the form \(A \sqsubseteq \exists R.B\) are also referred as existential axioms. An ABox axiom is a formula of the form A(a) or R(ab) where \(A \in {{N}_{C}}\), \(R \in {N}_{R} \) and \(a, b \in {{N}_{I}}\). An axiom is either a TBox or an ABox axiom. As usual, we simply refer to a set of TBox (resp. ABox) axioms as a TBox (resp. an ABox).

Fig. 1.
figure 1

Mapping axioms \(\alpha \) to rules \(\varPi (\alpha )\), where \(A_{(i)}, B \in {{N}_{C}}\), \(R, S, V \in {N}_{R} \).

A Horn- \(\mathcal {SRIQ}\) ontology \(\mathcal {O} \) (or simply an ontology) is some tuple \(\langle \mathcal {T}, \mathcal {A} \rangle \), where \(\mathcal {T} \) and \(\mathcal {A} \) are a TBox and an ABox, respectively, which satisfies the usual conditions [10].

Due to the close correspondence between ontologies and programs, we define the semantics of the former by means of a mapping into the latter. Given some TBox \(\mathcal {T} \), let \(\mathcal {R} _\mathcal {T} = \varPi (\mathcal {T})\). Given some ontology \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \), let \(\mathcal {P} (\mathcal {O}) = (\mathcal {R} _\mathcal {T} , \mathcal {A})\) where \(\varPi \) is the function from Fig. 1. We say that \(\mathcal {O}\) is satisfiable if and only if the program \(\mathcal {P} (\mathcal {O})\) is satisfiable. Furthermore, \(\mathcal {O}\) entails a query \(\gamma \), written \(\mathcal {O} \models \gamma \), if and only if \(\mathcal {P} (\mathcal {O})\) is unsatisfiable or \(\mathcal {P} (\mathcal {O}) \) entails \(\gamma \).

3 The Chase Algorithm

In this section we present two variants of the chase algorithm, which are somewhat similar to the oblivious and restricted chase from [3], and elaborate about how such procedures may be used to solve CQ entailment over ontologies.

Definition 1

A fact \(\phi \) is an oblivious consequence of a TGD \(\rho = \beta ({\varvec{x}}, \varvec{z}) \rightarrow \exists \varvec{y} \eta ({\varvec{x}}, \varvec{y})\) on a set of facts \(\mathcal {F}\) if and only if there is some substitution \(\sigma \) with \(\beta ({\varvec{x}}, \varvec{z})\sigma \subseteq {\mathcal {F}}\) and \({\phi }\in sk(\eta ({\varvec{x}}, \varvec{y}))\sigma \) where \(sk(\eta ({\varvec{x}}, \varvec{y}))\) is the head of the (skolemized) TGD \(sk(\rho )\). A fact \(\phi \)is a restricted consequence of \(\rho \) on \(\mathcal {F}\)if and only if there is a substitution \(\sigma \) with (1) \(\beta ({\varvec{x}}, \varvec{z})\sigma \subseteq {\mathcal {F}}\) and \({\phi }\in sk(\eta ({\varvec{x}}, \varvec{y}))\sigma \), and (2) there is no substitution \(\tau \supseteq \sigma \) with \(\eta ({\varvec{x}}, \varvec{y}) \tau \subseteq {\mathcal {F}}\).

The result of obliviously applying \(\rho \) to \(\mathcal {F}\), written \(\rho _{O}({\mathcal {F}})\), is the set of all oblivious consequences of \(\rho \) on \(\mathcal {F}\). The result of obliviously applying a set of TGDs \(\mathcal {R}\) to \(\mathcal {F}\), written \(\mathcal {R} _{O}({\mathcal {F}})\), is the set \(\bigcup _{\rho \in \mathcal {R}} \rho _O({\mathcal {F}}) \cup {\mathcal {F}}\). The result of restrictively applying \(\rho \) to \(\mathcal {F}\)(resp., \(\mathcal {R}\) to \(\mathcal {F}\)), written \(\rho _{R}(\mathcal {T})\) (resp., \(\mathcal {R} _{R}(\mathcal {T})\)), is analogously defined.

Definition 2

Let \(\leadsto \) be some total strict order over the set of all terms such that \(t \leadsto u\) only if \(dep (t) \le dep (u)\). Furthermore, we say that t is greater than u with respect to \(\leadsto \) to indicate \(t \leadsto u\).

Given a set of EGDs \(\mathcal {R}\) and a set of facts \(\mathcal {F}\), let \(\mapsto ^\mathcal {R} _{\mathcal {F}}\) be the minimal congruence relation over terms such that \(t \mapsto ^\mathcal {R} _{\mathcal {F}}u\) if and only if there exists some \(\beta ({\varvec{x}}) \rightarrow x \approx y \in \mathcal {R} \) and some substitution \(\sigma \) with \(\beta ({\varvec{x}})\sigma \subseteq {\mathcal {F}}\), \(\sigma (x) = t\) and \(\sigma (y) = u\). Let \(\mathcal {R} ({\mathcal {F}})\) be the set that is obtained from \(\mathcal {F}\) by replacing all occurrences of every term t by u where u is the greatest term with respect to \(\leadsto \) such that \(t \mapsto ^\mathcal {R} _{\mathcal {F}}u\).

Note that we define consequences with respect to sets of rules instead of simply (single) rules as it is customary [3]. This allows us to define the chase as a deterministic procedure (modulo \(\leadsto \)). Also, unlike in [3], where a lexicographic order is used to direct the replacement of terms, we employ a type of order which ensures that terms are always replaced by terms of equal or lesser depth. This effectively precludes some “deeper” terms from being introduced during the computation of the chase.

Definition 3

Let \(\mathcal {P} = \langle \mathcal {R},\mathcal {I} \rangle \) be some program. The oblivious chase sequence of \(\mathcal {P}\) is the sequence \({\mathcal {F}}_0, {\mathcal {F}}_1, \ldots \) such that \({\mathcal {F}}_1 = \mathcal {I} \) and, for all \(i \ge 1\), \({\mathcal {F}}_i\) is the set of facts defined as follows.

  • If \(\mathcal {R} ^\approx ({\mathcal {F}}_{i-1}) \ne {\mathcal {F}}_{i-1}\), then \({\mathcal {F}}_i = \mathcal {R} ^\approx ({\mathcal {F}}_{i-1})\).

  • If \({\mathcal {F}}_{i-1} = \mathcal {R} ^\approx ({\mathcal {F}}_{i-1})\) and \({\mathcal {F}}_{i-1} \ne \mathcal {R} ^\forall _O({\mathcal {F}}_{i-1}) \), then \({\mathcal {F}}_i = \mathcal {R} ^\forall _O({\mathcal {F}}_{i-1})\).

  • Otherwise, \({\mathcal {F}}_{i} = \mathcal {R} ^\exists _O({\mathcal {F}}_{i-1})\).

The restricted chase sequence of \(\mathcal {P}\) is defined analogously.

For the sake of brevity, we frequently denote the oblivious (resp., restricted) chase sequence of a program \(\mathcal {P} \) with \(\mathcal {P} _{O}^{1}, \mathcal {P} _{O}^{2}, \ldots \) (resp., \(\mathcal {P} _{R}^{1}, \mathcal {P} _{R}^{2}, \ldots \))

Definition 4

Let \(\mathcal {P}\) be some program and let \(\mathcal {R}\) be some set of rules. Then, the oblivious chase of \(\mathcal {P}\) is the set \({OC}(\mathcal {P}) = \bigcup _{i \in \mathbb {N}} \mathcal {P} _{O}^{i} \). The restricted chase of \(\mathcal {P}\), written \({RC}(\mathcal {P})\), is defined analogously.

The oblivious (resp., restricted) chase of \(\mathcal {P}\) terminates if and only if there is some i such that, for all \(j \ge i\), \(\mathcal {P} _{O}^{i} = \mathcal {P} _{O}^{j} \). Furthermore, the oblivious (resp., restricted) chase of a set of rules \(\mathcal {R}\) terminates if the oblivious (resp., restricted) chase of every program of the form \(\langle \mathcal {R},\mathcal {I} \rangle \) terminates.

Our definition of the chase sequence ensures that rules which do not contain existentially quantified variables are always applied with a higher priority than rules that do. Note that, by postponing the application of rules with existential variables, we may prevent them from introducing further consequences.

The (restricted or oblivious) chase of a program can be employed to solve CQ entailment [3]. I.e., a program \(\mathcal {P}\) entails a query \(\gamma \), written \(\mathcal {P} \models \gamma \), if and only if either \({OC}(\mathcal {P}) \models \exists y \bot (y)\) or \({OC}(\mathcal {P}) \models \gamma \) (resp., \({RC}(\mathcal {P}) \models \exists y \bot (y)\) or \({RC}(\mathcal {P}) \models \gamma \)). Thus, we may also use the chase to solve CQ entailment over ontologies: An ontology \(\mathcal {O}\) entails a query \(\gamma \) if and only if \({OC}(\mathcal {P} (\mathcal {O})) \models \exists y \bot (y)\) or \({OC}(\mathcal {P} (\mathcal {O})) \models \gamma \) (resp., \({RC}(\mathcal {P} (\mathcal {O})) \models \exists y \bot (y)\) or \({RC}(\mathcal {P} (\mathcal {O})) \models \gamma \)).

Fig. 2.
figure 2

Ontology \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \), program \(\mathcal {P} (\mathcal {O})\) and the chase of \(\mathcal {P} (\mathcal {O})\).

For readability purposes, we say that the oblivious (resp. restricted) chase of some ontology \(\mathcal {O}\) terminates if and only if the oblivious (resp. restricted) chase of \(\mathcal {P} (\mathcal {O})\) terminates. The oblivious (resp. restricted) chase of some TBox \(\mathcal {T}\) terminates if and only if if the oblivious (resp. restricted) chase of \(\mathcal {R} _\mathcal {T} \) terminates.

As expected, the restricted chase has a better behavior than the oblivious chase; i.e., in some cases, the former might terminate when the latter does not:

Example 5

Let \(\mathcal {O}\) = \(\langle \mathcal {T}, \mathcal {A} \rangle \) be as in Fig. 2. The figure depicts also the computation of the oblivious chase and that of the restricted chase of \(\mathcal {P} (\mathcal {O})\). In this case, \({RC}(\mathcal {P} (\mathcal {O}))\) terminates whereas \({OC}(\mathcal {P} (\mathcal {O}))\) does not.

4 Model Faithful Acyclicity

In this section we briefly describe Model Faithful Acyclicity (MFA) [6], one of the most general acyclicity conditions for sets of rules. MFA guarantees the termination of the oblivious chase of a program by imposing that no cyclic term occurs in the chase. Note that, a condition such as MFA can be applied to check whether a TBox \(\mathcal {T}\) is acyclic; i.e., \(\mathcal {T}\) is MFA if and only if \(\mathcal {R} _\mathcal {T} \) is MFA.

When one is interested in checking the termination of the oblivious chase with respect to every possible instance, it is enough to check termination with respect to a special instance, the critical instance [14]. The critical instance is the minimal set which contains all possible atoms that can be formed using the relational symbols which occur in TGDs and the special constant \(\star \). Such a strategy is used by MFA to guarantee termination of a set of rules.

While the actual definition of MFA does not preclude the existence of EGDs, equality is assumed to be axiomatized, and thus it is treated as a regular predicate (EGDs are de facto TGDs). To reflect such treatment we will use the special predicate \(\textit{Eq} \) to denote equality. However, as the following example shows, the presence of equality in a set of TGDs frequently makes the MFA membership test fail.

Example 6

Let \(\varSigma \) be the following set of rules and let \(\varSigma '\) be the set of rules that result from axiomatizing the equality predicate as usual (see Sect. 2.1 of [6]). Furthermore, let \(\mathcal {I}_\star (\varSigma ') \) be the critical instance of \(\varSigma '\).

$$\begin{aligned} \varSigma = \{&A(x) \wedge B(x) \rightarrow \exists y [R(x,y) \wedge B(y)], R(z, x_1) \wedge R(z,x_2) \rightarrow \textit{Eq} (x_1, x_2)\} \\ \textsf {Eq} = \{&\top (x) \rightarrow \textit{Eq} (x, x), \textit{Eq} (x, y) \rightarrow \textit{Eq} (y, x), \textit{Eq} ( x, z) \wedge \textit{Eq} (z, y) \rightarrow \textit{Eq} (x, y)\} \\ \varSigma ' = \{&A(x) \wedge \textit{Eq} (x, y) \rightarrow A(y), R(x, y) \wedge \textit{Eq} (x, z) \rightarrow R(z, y), \\&R(x, y) \wedge \textit{Eq} (y, z) \rightarrow R(x, z)\} \cup \varSigma \cup \textsf {Eq} \\ \mathcal {I}_\star (\varSigma ') = \{&A(\star ), R(\star , \star ), \textit{Eq} (\star ,\star ) \} \end{aligned}$$

The oblivious chase of \((\varSigma ', \mathcal {I}_\star (\varSigma '))\) does not terminate.

$$\begin{aligned} (\varSigma ', \mathcal {I}_\star (\varSigma ') )_{O}^{1} = \{&R(\star , f(\star )), B(f(\star )), \textit{Eq} (\star , f(\star )) \} \cup \mathcal {I}_\star (\varSigma ') \\ (\varSigma ', \mathcal {I}_\star (\varSigma ') )_{O}^{2} = \{&A(f(\star )), R(f(\star ), f(f(\star ))), B(f(f(\star ))), \ldots \} \\ \ldots \ldots \ldots \ldots&\ldots \ldots \ldots \ldots \end{aligned}$$

To avoid this situation, the use of singularization [14], a somewhat “less-harmful” axiomatization of equality, is proposed in [6].

Definition 7

A singularization of a rule \(\rho \) is the rule \(\rho '\) that results from performing the following transformation for every variable v in the body of \(\rho \):

  • Rename each occurrence of v using different fresh variables \(v_1, \ldots , v_n\),

  • pick some \(j = 1, \ldots , n\) and add the atoms \(\textit{Eq} (v_1, v_j), \ldots , \textit{Eq} (v_n, v_j)\) to the body of \(\rho \) and

  • replace any occurrence of v in the head of \(\rho \) with \(v_j\).

Let \(\varSigma \) be a set of TGDs and let \(\textsf {Eq}\) be the set from Example 6. A singularization of \(\varSigma \) is a set of TGDs \(\varSigma '\) which contains \(\textsf {Eq}\) and exactly one singularization of every \(\rho \in \varSigma \). Let \(\textit{Sing}(\varSigma )\) be the set of all possible singularizations of \(\varSigma \).

Example 8

Rule \(A(x) \wedge B(x) \rightarrow \exists y [R(x,y) \wedge B(y)]\) from Example 6 admits two possible singularizations: (i) \(A(x_1) \wedge B(x_2) \wedge \textit{Eq} (x_2, x_1) \rightarrow \exists y [R(x_1,y) \wedge B(y)]\) and (ii) \(A(x_1) \wedge B(x_2) \wedge \textit{Eq} (x_1, x_2) \rightarrow \exists y [R(x_2, y) \wedge B(y)]\).

Note that, for any \(\varSigma ' \in \textit{Sing}(\varSigma )\), if \(\varSigma '\) is MFA, then the oblivious chase of \(\varSigma '\) can be used to answer queries on \(\varSigma \) [6]. The use of singularization along with MFA gives rise to the following acyclicity notions.

Definition 9

For a set of TGDs \(\varSigma \), if there is some \(\varSigma ' \in \textit{Sing}(\varSigma )\) which is MFA, then \(\varSigma \) is said to be \(\text {MFA} ^\exists \). If every \(\varSigma ' \in \textit{Sing}(\varSigma )\) is MFA, then \(\varSigma \) is \(\text {MFA}^\forall \).

To some extent, the use of singularization solves the problems with equality: One can check that \(\varSigma \) in Example 6 is \(\text {MFA}^\exists \), but not \(\text {MFA}^\forall \). Nevertheless, due to the high number of possible singularizations, it is frequently not feasible to check \(\text {MFA}^\exists \) or \(\text {MFA}^\forall \) membership. A simpler alternative is to check whether \(\bigcup _{\varSigma ' \in \textit{Sing}(\varSigma )} \varSigma '\) is \(\text {MFA} \). If that is the case, then \(\varSigma \) is said to be \(\text {MFA}^\cup \). Note that in the case of Horn- \(\mathcal {SRIQ}\) TBoxes, \(\vert \bigcup _{\varSigma ' \in \textit{Sing}(\varSigma )} \varSigma ' \vert \) is actually polynomial in \(\vert \varSigma \vert \) and, as such, \(\text {MFA}^\cup \) is more feasible to check. Thus, we will use \(\text {MFA}^\cup \) as a baseline for the evaluation of the new acyclicity condition \(\text {RCA}_{n}\), which is introduced in the next section.

5 Restricted Chase Acyclicity

While MFA is quite a general acyclicity condition, it has two main drawbacks:

  1. 1.

    It only considers the oblivious chase, which as we have seen in Example 5, might not terminate (even though the restricted chase does!), and

  2. 2.

    its treatment of equality via singularization is cumbersome and inefficient in practice. Not only \(\text {MFA}^\exists \) and \(\text {MFA}^\forall \) are difficult to check, but even after a set of TGDs are established to belong to some \(\text {MFA} \) subclass, one has to employ a singularized program for reasoning purposes.

In this section, we present \(\text {RCA}_{n}\), an acyclicity notion with neither of these drawbacks: \(\text {RCA}_{n}\) verifies termination of the restricted chase of a TBox and does not require the use of cumbersome axiomatizations of the equality predicate. Furthermore, unlike MFA, \(\text {RCA}_{n}\) allows for the presence of cyclic terms in the chase up to a given depth n.

Since we are primarily interested in termination of the restricted chase of a Horn- \(\mathcal {SRIQ}\) TBox, one might wonder why we do not simply check for termination of the restricted chase for such a TBox with respect to the critical instance, as it is done in the previous section with the oblivious chase. Unfortunately, this is not possible: The restricted chase of any set of existential rules always terminates with respect to the critical instance. Thus, we have to devise more sophisticated techniques to check the termination of the restricted chase. We start by introducing the notion of an overchase for a TBox.

Definition 10

A set of facts \(\mathcal {V}\) is an overchase for some TBox \(\mathcal {T}\) if and only if, for every \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \), \({RC}(\mathcal {P} (\mathcal {O}))_\star \subseteq \mathcal {V} \).

Given some TBox \(\mathcal {T}\), an overchase for \(\mathcal {T}\) may be intuitively regarded as an over-approximation of the restricted chase of \(\mathcal {T}\).

Lemma 11

If there exists a finite overchase for a TBox, then the restricted chase of such TBox terminates.

Thus, to determine whether the chase of a TBox \(\mathcal {T}\) terminates, we introduce a procedure to compute an overchase for \(\mathcal {T}\) and a means to check its termination. We proceed with some preliminary notions and notation.

Definition 12

Let \(\mathcal {T}\) be some TBox and t a term. Let \(\mathcal {I} (t)\) be the set of facts defined as follows: If t is of the form \(f_{\rho }^{y}(s)\) where \(\rho = A(x) \rightarrow \exists y [R(x, y) \wedge B(y)]\), then \(\mathcal {I} (t) = \{A(s), R(s, t), B(t)\} \cup \mathcal {I} (s)\); otherwise, \(\mathcal {I} (t) = \emptyset \). Furthermore, we introduce the program \(\mathcal {U}(\mathcal {T}, t) = \langle \mathcal {R} _\mathcal {T} ^\forall \cup \mathcal {R} _\mathcal {T} ^\approx , \mathcal {I} (t) \rangle \).

Intuitively, the restricted chase of the program \(\mathcal {U}(\mathcal {T}, t)\) can be regarded as some kind of under-approximation of the facts that must occur in the chase of every program of the form \(\mathcal {P} (\langle \mathcal {T}, \mathcal {A} \rangle )\) where t occurs. I.e., if t occurs in the restricted chase sequence of any program \(\mathcal {P} (\langle \mathcal {T}, \mathcal {A} \rangle )\), then the facts in the restricted chase of \(\mathcal {U}(\mathcal {T}, t)\) must also occur (up to renaming) in the chase sequence of such program. Furthermore, due to the special priority of application of the rules during the computation of the chase, the facts in the restricted chase of \(\mathcal {U}(\mathcal {T}, t)\) must occur in the restricted chase sequence of every program of the form \(\mathcal {P} (\langle \mathcal {T}, \mathcal {A} \rangle )\) before any successors of t are introduced.

Example 13

Let \(\mathcal {O} \), \(\rho \) and \(\upsilon \) be the ontology and rules from Example 5. Then, by Definition 12:

$$\begin{aligned} \mathcal {I} (f_{\rho }^{y}(AI))&= \{Film(AI), isProdBy(AI, f_{\rho }^{y}(AI)), Producer(f_{\rho }^{y}(AI))\} \text { and }\\ {RC}(\mathcal {U}(\mathcal {T}, f_{\rho }^{y}(AI)))&= \{prod(f_{\rho }^{y}(AI), AI)\} \cup \mathcal {I} (f_{\rho }^{y}(AI)). \end{aligned}$$

All the facts in the restricted chase of \(\mathcal {U}(\mathcal {T}, t)\) occur in the restricted chase sequence of \(\mathcal {P} (\mathcal {O})\) before any successors of term \(f_{\rho }^{y}(AI)\) are introduced. This is because the rule \(isProdBy(y, x) \rightarrow prod(x, y)\) is applied with a higher priority than the rule \(\upsilon = Producer(x) \rightarrow \exists y [prod(x, y) \wedge Film(y)]\).

Given a TBox \(\mathcal {T}\) and some term of the form \(f_{\rho }^{y}(t)\), we can in some cases conclude that such a term may never occur during the computation of the restricted chase of every program of the form \(\mathcal {P} (\langle \mathcal {T}, \mathcal {A} \rangle )\) by carefully inspecting the facts in the set \(\mathcal {U}(\mathcal {T}, t)\).

Definition 14

Let \(\mathcal {T}\) be a TBox and t a term of the form \(f_{\rho }^{y}(s)\) where \(\rho = A(x) \rightarrow \exists y [R(x, y) \wedge B(y)]\). We say that a term t is restricted with respect to \(\mathcal {T} \) if and only if there is some term u with \(\{R([s], u), B(u)\} \subseteq {RC}(\mathcal {U}(\mathcal {T}, s))\) where \([s] = [v]\), if s is replaced by v during the computation of the restricted chase sequence; and \([s] = s\), otherwise.

We often simply say that a term is “restricted”, instead of “restricted with respect to \(\mathcal {T} \),” if the TBox \(\mathcal {T} \) is clear from the context.

Lemma 15

Let \(\mathcal {T}\) be a TBox and t a restricted term. Then, for every possible \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \), \(t \notin {RC}(\mathcal {P} (\mathcal {O}))\).

Proof

(Sketch). Let t be a term of the form \(f_{\rho }^{y}(s)\) where \(\rho = A(x) \rightarrow \exists y(R(x, y) \wedge B(y))\). We can verify that, if t occurs during the computation of the chase sequence, then every fact \({RC}(\mathcal {U}(\mathcal {T}, s))\) will also be included in such chase sequence before any new terms are introduced. Thus, if t is indeed restricted, there must be some u with R([s], u) and B(u) occurring in the chase sequence. Therefore, by the definition of the chase, the term t may never be derived.

Fig. 3.
figure 3

Expansion rules for the construction of \(\mathcal {V}_\mathcal {T} \).

Example 16

Let \(\mathcal {T}\), \(\rho \) and \(\upsilon \) be the TBox and rules from Example 5. We proceed to show that the term \(f_{\rho }^{y}(f_{\upsilon }^{y}(AI))\) is restricted. First, we compute the restricted chase of \(\mathcal {U}(\mathcal {T}, f_{\upsilon }^{y}(AI)) \).

$$\begin{aligned} {RC}(\mathcal {U}(\mathcal {T}, f_{\upsilon }^{y}(AI))) = \{&Producer(AI), prod(AI, f_{\upsilon }^{y}(AI)), \\&Film(f_{\upsilon }^{y}(AI)), isProdBy(f_{\upsilon }^{y}(AI), AI)\} \end{aligned}$$

Note that \(\{isProdBy(f_{\upsilon }^{y}(AI), AI), Producer(AI)\} \subseteq {RC}(\mathcal {U}(\mathcal {T}, f_{\upsilon }^{y}(AI)))\). Thus, \(f_{\rho }^{y}(f_{\upsilon }^{y}(AI))\) is restricted with respect to \(\mathcal {T}\) and, by Lemma 15, it may not occur in the restricted chase of a program of the form \(\mathcal {P} (\langle \mathcal {T}, \mathcal {A} \rangle )\). Furthermore, by Definition 14, if \(f_{\rho }^{y}(f_{\upsilon }^{y}(AI))\) is restricted, then every term of the form \(f_{\rho }^{y}(f_{\upsilon }^{y}(c))\), where c is a constant, is also restricted.

With Definition 14 and Lemma 15 in place, we proceed with the definition of a procedure to construct an overchase for some given TBox \(\mathcal {T} \).

Definition 17

Let \(\mathcal {T} \) be a TBox. We define \(\mathcal {V}_\mathcal {T} \) as the set initially containing every fact in \(\mathcal {I}_\star (\mathcal {R} _\mathcal {T} ) \) which is then expanded by repeatedly applying the rules in Fig. 3 (in non-deterministic order).

Lemma 18

The set \(\mathcal {V}_\mathcal {T} \) is an overchase of the TBox \(\mathcal {T}\).

Proof

(Sketch). The lemma can be proven via induction on chase sequence of any ontology of the form \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \). Note that, \(\mathcal {O} _{R}^{0} \subseteq \mathcal {V}_\mathcal {T} \) by the definition of \(\mathcal {V}_\mathcal {T} \). It can be verified that, for every possible derivation of a set of facts during the computation of the chase of \(\mathcal {O}\), such facts will always be contained in \(\mathcal {V}_\mathcal {T} \).

Corollary 19

The restricted chase of some TBox \(\mathcal {T}\) terminates if \(\mathcal {V}_\mathcal {T} \) is finite.

Example 20

Let \(\mathcal {T}\) be the TBox from Example 5. Then \(\mathcal {V}_\mathcal {T} \) is as follows.

$$\begin{aligned} \mathcal {V}_\mathcal {T} = \{&Film(\star ), isProdBy(\star ), Producer(\star ), prod(\star , \star ),\nonumber \\&isProdBy(\star , f_{\rho }^{y}(\star )), Producer(f_{\rho }^{y}(\star )), prod(\star , f_{\upsilon }^{y}(\star )), Producer(f_{\upsilon }^{y}(\star ))\} \end{aligned}$$

Note that terms \(f_{\rho }^{y}(f_{\upsilon }^{y}(\star ))\) and \(f_{\upsilon }^{y}(f_{\rho }^{y}(\star ))\) are restricted and thus, they are not included in \(\mathcal {V}_\mathcal {T} \). Since \(\mathcal {V}_\mathcal {T} \) is finite, we can conclude termination of the restricted chase of the TBox \(\mathcal {T}\).

In the previous example, we were able to ascertain termination of the restricted chase of \(\mathcal {T}\) after verifying that the set \(\mathcal {V}_\mathcal {T} \) is finite. A sufficient condition for finiteness of \(\mathcal {V}_\mathcal {T} \) is to only allow cyclic terms up to a certain depth in this set. We use such condition to formally define \(\text {RCA}_{n}\).

Definition 21

A TBox \(\mathcal {T}\) is \(\text {RCA}_{n}\) if and only if there are no n-cyclic terms in \(\mathcal {V}_\mathcal {T} \). An ontology \(\langle \mathcal {T}, \mathcal {A} \rangle \) is \(\text {RCA}_{n}\) if and only if \(\mathcal {T}\) is \(\text {RCA}_{n}\).

Theorem 22

If a TBox \(\mathcal {T}\) is \(\text {RCA}_{n}\) then the restricted chase of \(\mathcal {T}\) terminates.

We proceed with several results regarding the complexity of deciding \(\text {RCA}_{n}\) membership and reasoning over \(\text {RCA}_{n}\) ontologies.

Theorem 23

Deciding whether some TBox \(\mathcal {T}\) is \(\text {RCA}_{n}\) is in ExpTime.

Theorem 24

Let \(\mathcal {O} = \langle \mathcal {T}, \mathcal {A} \rangle \) be some \(\text {RCA}_{n}\) ontology and \(\gamma \) a query. Then, checking whether \(\mathcal {O} \models \gamma \) is ExpTime-complete.

To close the section, we present several results in which we theoretically compare the generality of \(\text {RCA}_{n}\) to \(\text {MFA}^\cup \).

Theorem 25

\(\text {MFA}^\cup \) does not cover \(\text {RCA}_{1}\).

Proof

The TBox \(\mathcal {T}\) from Example 5 is \(\text {RCA}_{1}\) but not \(\text {MFA}^\cup \).

Theorem 26

If \(\mathcal {T}\) is \(\text {MFA}^\cup \) then \(\mathcal {T}\) is \(\text {RCA}_{n}\) for every \(n > \vert \mathcal {T} ^\exists \vert \) where \(\mathcal {T} ^\exists \) is the set of all existential axioms in \(\mathcal {T}\).

6 Evaluation

6.1 An Empirical Comparison of \(\text {RCA}_{n}\) and \(\text {MFA}^\cup \)

In this section we include an empirical comparison of the generality of \(\text {RCA}_{n}\) and \(\text {MFA}^\cup \). For our experiments, we use the TBoxes of the ontologies in the OWL Reasoner Evaluation workshop (ORE, https://www.w3.org/community/owled/ore-2015-workshop/) and Ontology Design Patterns (ODP, http://www.ontologydesignpatterns.org) datasets. The former is a large repository used in the ORE competition containing a large corpus of ontologies. The latter contains a wide range of smaller ontologies that capture design patterns commonly used in ontology modeling. The ORE dataset is rather large, and thus we restrict our experiments to the 294 ontologies with the smallest number of existential axioms, while skipping the 77 ontologies with the largest number of existential axioms. The number of such axioms contained in an ontology is a useful metric to predict the “hardness” of acyclicity membership tests; i.e. running these experiments would be very time-intensive, while our results, reported below, already indicate that for such very hard TBoxes \(\text {MFA}^\cup \) and \(\text {RCA}_{n}\) will likely not differ much (while they differ significantly for ontologies with a lower count of existential axioms).

Fig. 4.
figure 4

Results for the ORE and ODP Repositories.

Only Horn- \(\mathcal {SRIQ}\) TBoxes which cannot be expressed in any of the OWL 2 profiles were considered in our experiments. This is because all OWL 2 RL TBoxes are acyclic (with respect to every applicable acyclicity notion known to us), and there already exist effective algorithms and efficient implementations that solve CQ answering over OWL 2 EL and OWL 2 QL ontologies [11, 17, 18] (albeit, if these do not include complex roles).

The results from our experiments are summarized in Fig. 4. The evaluated TBoxes are sorted into brackets depending on the number of existential axioms they contain. For each bracket we provide the average number of axioms in the ontologies (“Avg. Size”), the number of ontologies (“Count”), and, for every condition “X” considered, the percentage of “X acyclic” ontologies

\(\text {RCA}_{2}\) and \(\text {RCA}_{3}\) turned out to be indistinguishable with respect to the TBoxes considered and thus, we limit our evaluation to \(\text {RCA}_{n}\) with \(n \le 3\). Our tests reveal that \(\text {RCA}_{2}\) is significantly more general than \(\text {MFA}^\cup \), particularly when it comes to TBoxes with a low count of existential axioms. However note that reasoning over ontologies with few (existential) axioms is in general not trivial: All of the ontologies considered in our materialization tests (see Fig. 5) contain less than 20 existential axioms. For TBoxes containing from 1 to 10 existential axioms in the ORE dataset, more than half of the ontologies which are not \(\text {MFA}^\cup \) are \(\text {RCA}_{2}\). Furthermore, the 4 ontologies in the ODP dataset which are not \(\text {MFA}^\cup \) are \(\text {RCA}_{2}\). Interestingly, in both repositories we could not find any ontology that is \(\text {MFA}^\cup \) but not \(\text {RCA}_{1}\). Thus, with respect to the TBoxes in our corpus, \(\text {RCA}_{1}\) already proves to be more general than \(\text {MFA}^\cup \).

In total, we looked at 312 ontologies, \(62\,\%\) and \(75\,\%\) of which are \(\text {MFA}^\cup \) and \(\text {RCA}_{2}\), respectively. To gauge the significance of this improvement, we roughly compare these numbers with the results presented in [6]. In that paper, the authors consider a total of 336 ontologies, of which \(49\,\%\), \(58\,\%\) and \(68\,\%\) are weakly acyclic [7], jointly acyclic [12] and \(\text {MFA}^\cup \), respectively. Even though the comparison is not over the same TBoxes, we verify that the improvement in generality of our notion is in line with previous iterations of related work.

Fig. 5.
figure 5

Results for Reactome, Uniprot, LUBM and UOBM (sorted from top to bottom in the above table).

6.2 A Materialization Based Reasoner

We now report on an implementation of the restricted chase as defined in Sect. 3. Moreover, we also present an implementation of the oblivious chase with singularization, i.e., the chase as it must be used if we employ \(\text {MFA}^\cup \) (see Sect. 4). We use the datalog engine RDFOx [15] in both implementations.

We evaluate the performance of our chase based implementations against Konclude [19], a very efficient OWL DL reasoner, and PAGOdA [20], a hybrid approach to query answering over ontologies. PAGOdA combines a datalog reasoner with a fully-fledged OWL 2 reasoner in order to provide scalable ’pay-as-you-go’ performance and is, to the best of our knowledge, the only other implementation that may solve CQ answering over Horn- \(\mathcal {SRIQ}\) ontologies with completeness guarantees, albeit only in some cases. Nevertheless, PAGOdA was able to solve all the queries (that is, all of which for which it did not time-out or run out of memory) in this evaluation in a sound and complete manner.

We consider two real-world ontologies in our experiments, Reactome and Uniprot, and two standard benchmarks, LUBM and UOBM, all of which contain a large amount of ABox axioms. Axioms in these ontologies which are not expressible in Horn- \(\mathcal {SRIQ}\) were pruned. Furthermore, one extra axiom had to be removed from Uniprot for it to be both \(\text {MFA}^\cup \) and \(\text {RCA}_{1}\) acyclic.

The results from our experiments are summarized in Fig. 5. For each ontology, we consider four samples of the original ABox. The number of triples contained in each one of these is indicated at the beginning of each row, under the column “Triples Count”. As previously mentioned, we consider four different implementations: These include the two aforementioned variants of the chase (“Restricted” and “Oblivious”), PAGOdA (“PAGOdA”) and Konclude (“Konc.”). For both chase based implementations, we check the time it takes to compute the chase (“C”) and then the time to solve each of the four queries crafted for each ontology (“Q1–Q4”). In a similar manner, we list the time PAGOdA takes to preprocess each ontology (“P”) plus the time it takes to answer the queries (“Q1–Q4”). Finally, we list the time Konclude takes to solve realization; i.e., the task of computing every fact of the form A(a) entailed by an ontology (note that Konclude cannot solve arbitrary CQ answering). Time-outs, indicated with “TO,” were set at 1 h for materialization and 5 min for queries. We make use of the acronym “OM” to indicate that an out-of-memory error occurred. Sometimes, a time-out or an out of memory error prevents us from answering the queries: Such a situation is indicated with “-.” All experiments were performed on a MacBook Pro with 8 GB of RAM and a 2.4 GHz Intel Core i5 processor.

For each ontology, we consider four different queries which are listed in the App. Section B included in the extended technical report. A summarized description of these queries, in which we ignore unary predicates, can be found in Fig. 6. For every ontology, the query Q1 is of the form \(\exists x, y, z R(x, y) \wedge R(z, y)\) where R is an existentially quantified role occurring in the TBox. It appears that PAGOdA has trouble with this kind of query, whereas the chase based implementations efficiently solve it in all but one case. This is probably due to the design of the hybrid reasoner which considers under and over approximations to provide complete answers to CQ: It appears that queries as the one previously considered find a large number of matches in the upper bound which slows down the performance of this reasoner. Queries Q2, and Q3 and Q4 are acyclic and cyclic, respectively (a query is acyclic if the shape of its body is acyclic). Even though it is well-known that answering acyclic CQs can be reduced to satisfiability [5], we included such a type of query in our evaluation in an attempt to verify whether solving acyclic queries is simpler than cyclic queries (this is indeed the case theoretically). Nevertheless, our experiments do not reveal any significant differences.

Fig. 6.
figure 6

Summarized queries for Reactome (top left), Uniprot (top right), LUBM (bottom left) and UOBM (bottom right).

First, note that computing the restricted chase employing renaming techniques to deal with equality is way more efficient than computing the oblivious chase with singularization. We conjecture that this is because the efficient built-in capabilities of RDFOx to deal with equality and the fact that the rules that result from the application of singularization are rather cumbersome. Second, see that our proposed algorithm is also superior to PAGOdA when it comes to CQ answering. Third, the implementation of the restricted chase outperforms the DL reasoner Konclude by an order of magnitude when it comes to solve materialization of the larger samples considered (note that, by computing the chase of a program we already solve materialization). It is clear that our implementation also scales much better than the OWL DL reasoner.

7 Conclusions and Future Work

We introduce a novel acyclicity notion for Horn- \(\mathcal {SRIQ}\) TBoxes and prove it to be, theoretically and empirically, more general than previously existing conditions [6]. To the best our knowledge, this is the first acyclicity notion (for ontologies or rules) which considers termination of the restricted chase algorithm. Moreover, our contribution is also relevant in practice: Based on our ideas, we produce an implementation which vastly outperforms state-of-the-art reasoners.

As future work, we plan to lift our acyclicity condition to the case of general rules; i.e., not only those resulting from the translation of Horn- \(\mathcal {SRIQ}\) TBoxes. We also intend to work on further optimizing our implementation of the \(\text {RCA}_{n}\) membership check and our restricted chase based algorithm.