Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

RDF has become one of the most important data formats for interoperability, knowledge representation and querying. SPARQL, the W3C standardized language for managing RDF data [11], has grown to offer great power and flexibility of querying, including support for efficient reasoning, rooted in more than a decade of intensive research in description logics. With respect to updates however, SPARQL is currently far less mature. In particular, the interplay between updates and reasoning remains completely open.

In [1], we discussed semantics of SPARQL updates for RDFS ontologies, for the cases in which the knowledge base ABox is fully materialized or to the contrary, is reduced to its minimal core that cannot be derived using TBox axioms. The present paper continues this study of SPARQL updates focusing on the role of inconsistency in supporting SPARQL ABox updates over materialized stores. As a minimalistic ontology language allowing for inconsistencies, we consider RDFS\(_{\lnot }\), an extension of RDFS [12] with class disjointness axioms of the form {P disjointWith Q} from OWL [16].

As a running example, we assume a triple store G with an RDFS\(_{\lnot }\) ontology (TBox) \(\mathcal {T}\) encoding an educational domain, asserting a range restriction plus mutual disjointness of the concepts like professor and student (we use Turtle syntax [2], in which \(\textsf {dw}\) abbreviates OWL’s disjointWith keyword, and \(\textsf {dom}\) and \(\textsf {rng}\) respectively stand for the domain and range keywords of RDFS).

figure a

Consider the following SPARQL update [8] request u in the context of the TBox \(\mathcal {T}\):

figure b

Consider an ABox with data on student tutors that happen to attend each other’s classes: \(\mathcal {A}_1 = \{\) :jim :attendsClassOf :ann. :ann :attendsClassOf :jim \(\}\). Here, u would create two assertions :jim :studentOf :ann and :ann :studentOf :jim. Due to the range and domain constraints in \(\mathcal {T}\), these assertions result in clashes both for Jim and for Ann. Note that all inconsistencies are in the new data, and thus we say that u is intrinsically inconsistent for the particular ABox \(\mathcal {A}_1\). We discuss how such updates can be fixed using SPARQL rewritings.

Now, let \(\mathcal {A}_2\) be the ABox \(\{\) :jim :attendsClassOf :ann. :jim a :Professor \(\}\). It is clear that after the update u, the ABox will become inconsistent with respect to \(\mathcal {T}\) due to the property assertion :jim :studentOf :ann, implying that Jim is both a professor and a student which contradicts the disjointness axiom. In contrast to the previous case, the clash here is between the prior knowledge and the new data. Based on [1] we propose three update semantics for this case, and provide efficient SPARQL rewriting algorithms for implementing them in the RDFS\(_{\lnot }\) setting.

The topic of knowledge base updates is extremely broad. Our aim in this paper is to adapt the basic belief revision operators for efficient implementation of ABox updates expressed in SPARQL 1.1, in the presence of RDFS\(_{\lnot }\) TBox axioms. In contrast to our setting, most of existing works on knowledge base evolution consider updates based on sets of ground facts to be inserted or deleted. Restricting negation to class disjointness allowed us to keep the presentation clear. It is not difficult to lift our rewritings to theories with role disjointness, functionality and inequality (owl:differentFrom). We discuss related work in more detail in Sect. 6.

In the remainder of the paper, after some short preliminaries (Sect. 2) we discuss checking for intrinsic inconsistencies in Sect. 3. Then in Sect. 4 we present three semantics for dealing with general inconsistencies in the context of materialized triple stores. Sect. 5 describes our practical evaluation of the semantics. Finally, Sect. 6 puts our work in the context of existing research and provides concluding remarks.

2 Preliminaries

We introduce basic notions about RDF graphs, RDFS\(_{\lnot }\) ontologies, and SPARQL queries. We will use RDF and DL notation interchangeably, treating RDF graphs without non-standard RDFS\(_{\lnot }\) vocabulary use [19] as a sets of TBox and ABox assertions.

Table 1. DL-Lite \(_{\textsc {rdfs}_{\lnot }}\) assertions vs. RDF(S), where A, \(A'\) denote concept (or, class) names, P, \(P'\) denote role (or, property) names, \(\varGamma \) is the set of IRI constants (excl. the OWL/RDF(S) vocabulary) and \(x,y \in \varGamma \). For RDF(S), we use abbreviations (rsc, \(\textsf {sp}\), \(\textsf {dom}\), \(\textsf {rng}\), \(\textsf {a}\)) as introduced in [17].

Definition 1

( RDFS \(_{\lnot }\) ABox, TBox, Triple Store). We call a set \(\mathcal {T}\) of inclusion assertions of the forms 1–5 in Table 1 an (RDFS\(_{\lnot }\)) TBox, a set \(\mathcal {A}\) of assertions of the forms 6–7 in Table 1 an (RDF) ABox, and the union \(G = \mathcal {T}\cup \mathcal {A}\) an (RDFS\(_{\lnot }\)) triple store.

Definition 2

(Interpretation, Satisfaction, Model, Consistency). An interpretation \(\langle \varDelta ^{\mathcal {I}},\cdot ^{\mathcal {I}}\rangle \) consists of a non-empty set \(\varDelta ^{\mathcal {I}}\) and an interpretation function \(\cdot ^{\mathcal {I}}\), which maps

  • each atomic concept A to a subset \(A^{\mathcal {I}}\) of \(\varDelta ^{\mathcal {I}}\),

  • each negation of atomic concept to \((\lnot A^\mathcal {I}) = \varDelta ^\mathcal {I} \setminus A^\mathcal {I}\),

  • each atomic role P to a binary relation \(P^{\mathcal {I}}\) over \(\varDelta ^{\mathcal {I}}\), and

  • each element of \(\varGamma \) to an element of \(\varDelta ^{\mathcal {I}}\).

For expressions \(\exists P\) and \(\exists P^-\), the interpretation function is defined as \((\exists P)^{\mathcal {I}} = \{x \in \varDelta ^{\mathcal {I}}\mid \exists y. (x,y) \in P^{\mathcal {I}}\}\) and \((\exists P^-)^{\mathcal {I}} = \{y \in \varDelta ^{\mathcal {I}}\mid \exists x. (x,y) \in P^{\mathcal {I}}\}\), resp. An interpretation \(\mathcal {I}\) satisfies an inclusion assertion \(E_1\sqsubseteq E_2\) (of one of the forms 1–5 in Table 1), if \(E_1^{\mathcal {I}}\subseteq E_2^{\mathcal {I}}\). Analogously, \(\mathcal {I}\) satisfies ABox assertions of the form A(x), if \(x^{\mathcal {I}}\in A^{\mathcal {I}}\), and of the form P(xy), if \((x^{\mathcal {I}},y^{\mathcal {I}}) \in P^{\mathcal {I}}\). An interpretation \(\mathcal {I}\) is called a model of a triple store G (resp., a TBox \(\mathcal {T}\), an ABox \(\mathcal {A}\)), denoted \(\mathcal {I}\models G\) (resp., \(\mathcal {I}\models \mathcal {T}\), \(\mathcal {I}\models \mathcal {A}\)), if \(\mathcal {I}\) satisfies all assertions in G (resp., \(\mathcal {T}\), \(\mathcal {A}\)). Finally, G is called consistent, if it does not entail both C(x) and \(\lnot C(x)\) for any concept C and constant \(x\in \varGamma \), where entailment is defined as usual.

As in [1], we treat only ABox updates with WHERE clauses restricted to unions of conjunctive queries (without projection) over DL ontologies:

Fig. 1.
figure 1

Minimal RDFS rules from [17]plus class disjointness “clash” rule from OWL2 RL [16].

Definition 3

(BGP, CQ, UCQ, Query Answer). A conjunctive query (CQ) q, or basic graph pattern (BGP), is a set of atoms of the form 6–7 from Table 1, where now \(x,y\in \varGamma \cup \mathcal {V}\), \(\mathcal {V}\) a countably infinite set of variables (written as ’?’-prefixed alphanumeric strings). A union of conjunctive queries (UCQ) Q, or union pattern, is a set of CQs. We denote with \(\mathcal {V}(q)\) (or \(\mathcal {V}(Q)\)) the set of variables from \(\mathcal {V}\) occurring in q (resp., Q). An answer (under RDFS\(_{\lnot }\) Entailment) to a CQ q over a triple store G is a substitution \(\theta \) of the variables in \(\mathcal {V}(q)\) with constants in \(\varGamma \) such that every model of G satisfies all facts in \(q\theta \). We denote the set of all such answers with \( ans _{\textsf {\tiny rdfs}}(q,G)\) (or simply \( ans (q,G)\)). The set of answers to a UCQ Q is \(\bigcup _{q\in Q} ans (q,G)\).

Query answering in the presence of ontologies is done either by rule-based pre-materialization of the ABox or by query rewriting. In the RDFS\(_{\lnot }\) case, materialization in polynomial time is feasible. Let \( mat (G)\) be the triple store obtained from exhaustive application of the inference rules in Fig. 1 on a consistent triple store G. We also define a special notation \({{\mathrm{chase}}}(q, \mathcal {T})\) to denote the “materialization” (also known as chase) of an ABox resp. a BGP q w.r.t. the TBox \(\mathcal {T}\). We call all triples occurring in \({{\mathrm{chase}}}(q, \mathcal {T})\) but not in q the effects of q w.r.t. \(\mathcal {T}\).

We now adapt the semantics for SPARQL update operations from [1].

Definition 4

(SPARQL Update Operation, Simple Update of a Triple Store). Let \(P_d\) and \(P_i\) be BGPs, and \(P_w\) a BGP or UNION pattern. Then an update operation \(u(P_d,P_i,P_w)\) has the form

$$ {{{\mathbf {\mathtt{{DELETE}}}}}} \quad P_d\quad {{{\mathbf {\mathtt{{INSERT}}}}}} \quad P_i\quad {{{\mathbf {\mathtt{{WHERE}}}}}} \quad P_w $$

Let \(G=\mathcal {T}\cup \mathcal {A}\) be a triple store then the simple update of G w.r.t. \(u(P_d,P_i,P_w)\) is defined as \(G_{u(P_d,P_i,P_w)} = (G \setminus \mathcal {A}_d ) \cup \mathcal {A}_i\), where \(\mathcal {A}_d=\bigcup _{\theta \in ans (P_w,G)} gr (P_d\theta )\), \(\mathcal {A}_i=\bigcup _{\theta \in ans (P_w,G)} gr (P_i\theta )\), and \( gr (P)\) denotes the set of ground triples in pattern P.

We call a triple store G (resp. the ABox of G) materialized if the equality \(G \setminus \mathcal {T}= mat (G) \setminus \mathcal {T}\) holds. In this paper, we will always consider G to be materialized and focus on “materialization preserving” semantics for SPARQL update operations, which we dubbed Sem \(^{ mat }_{2}\) in [1] and which preserves a materialized triple store. We recall the intuition behind Sem \(^{ mat }_{2}\), given an update \(u=(P_d, P_i, P_w)\): (i) delete the instantiations of \(P_d\) along with all their causes; (ii) insert the instantiations of \(P_i\) plus all their effects.

The notion of “causes” is made precise as follows. Given an ABox assertion A, \(A^\mathrm {caus}= \{B \mid A \in {{\mathrm{chase}}}(\{B\}, \mathcal {T})\}\). In the definition of \(A^\mathrm {caus}\), if A is a class membership \(\texttt {(x \textsf {a} C)}\) where \(x\in \varGamma \cup \mathcal {V}\), then B is one of \(\texttt {(x \textsf {a} C')}\), \(\texttt {(x \textsf {P} ?Y)}\), \(\texttt {(?Y \textsf {P} x)}\) for some fresh variable ?Y, class \(\texttt {C'}\) and role \(\textsf {P}\). If A is a role participation assertion \(\texttt {(x \textsf {R} z)}\), B is of the form \(\texttt {(x \textsf {P} z)}\), for some role \(\textsf {P}\). For a SPARQL triple (possibly with variables) C we use \(C^\mathrm {caus}\) to denote a BGP computed in the same way as for the ABox assertion A above.

Definition 5

(Sem \(^{ mat }_{2}\) [1]). Let \(u(P_d,P_i,P_w)\) be an update operation. Then

$$ G^{\mathbf{Sem}^{ mat }_{2}}_{u(P_d,P_i,P_w)} = G_{u(P_d^\mathrm {caus},\, P^\mathrm {eff}_i,\, \{P_w\}\{P_d^{\textit{fvars}}\})} $$

Here, \(P_d^\mathrm {caus}= \bigcup _{A \in atoms(P_d)} A^\mathrm {caus}\); \(P^\mathrm {eff}={{\mathrm{chase}}}(P,\mathcal {T})\) and \(P_d^\textit{fvars}\) is a pattern that binds variables occurring in \(P_d^\mathrm {caus}\) but not in \(P_d\) to the constants from \(\varGamma \) occurring in G.

We refer to [1] for further details, but stress that as such, Sem \(^{ mat }_{2}\) is not able to detect or deal with inconsistencies arising from extending G with instantiations of \(P_i\). In what follows, we will discuss how this can be remedied.

Remark 1

Note that although the DELETE clause \(P_d\) is syntactically a BGP, its semantics is different. Namely, triples occurring in \(P_d\) are mutually independent (cf. Definition 4), so that for every \(\theta \in ans (P_w,G)\), each atom in \(P_d\theta \cap G\) is deleted from G no matter which other atoms of \(P_d\theta \) occur in G. Therefore, \(P_d^\mathrm {caus}\) is computed atom-wise, unlike CQ rewriting [4]. Note that \(|A^\mathrm {caus}| = O(||\mathcal {T}||)\) where \(||\mathcal {T}||\) denotes the vocabulary size of \(\mathcal {T}\): in each RDFS\(_{\lnot }\) derivation, a class membership assertion can occur at most once for each class in \(\mathcal {T}\), and a role membership assertion can occur at most twice for every role in \(\mathcal {T}\). Thus, \(|P_d^\mathrm {caus}| \le 2|P_d|\cdot ||\mathcal {T}||\) and \(|P_i^\mathrm {eff}| \le |P_i|\cdot ||\mathcal {T}||\), so both can be computed in poly-time. This underpins the polynomial complexity of our rewritings.

3 Checking Consistency of a SPARQL Update

In the literature on the evolution of DL-Lite knowledge bases [5, 7], updates represented by pairs of ABoxes \(\mathcal {A}_d, \mathcal {A}_i\) have been studied. However, whereas such update might be viewed to fit straightforwardly to the corresponding \(\mathcal {A}_d, \mathcal {A}_i\) in Definition 4, it is typically assumed that \(\mathcal {A}_i\) is consistent with the TBox, and thus one only needs to consider how to deal with inconsistencies between the update and the old state of the knowledge base. However, this a priori assumption may be insufficient for SPARQL updates, where concrete values for inserted triples are obtained from variable bindings in the WHERE clause, and depending on the bindings, the update can be either consistent or not. This is demonstrated by the update u from Sect. 1 which, when applied to the ABox \(\mathcal {A}_1\), results in an inconsistent set \(\mathcal {A}_i\) of insertions. We call this intrinsic inconsistency of an update relative to a triple store \(G = \mathcal {T}\cup \mathcal {A}\).

Definition 6

Let G be a triple store. The update u is said to be intrinsically consistent w.r.t. G if the set of new assertions \(\mathcal {A}_i\) from Definition 4 generated by applying u to G, taken in isolation from the ABox of G, does not contradict the TBox of G. Otherwise, the update is said to be intrinsically inconsistent w.r.t. G.

Intrinsic inconsistency of the update differs crucially from the inconsistency w.r.t. the old state of the knowledge base, illustrated by the ABox \(\mathcal {A}_2\) from Sect. 1. This latter case can be addressed by adopting an update policy that prefers newer assertions in case of conflicts, as studied in the context of DL-Lite KB evolutions [5], which we will discuss in Sect. 4 below. Intrinsic inconsistencies however are harder to deal with, since there is no cue which assertion should be discarded in order to avoid the inconsistency. Our proposal here is thus to discard all mutually inconsistent pairs of insertions.

We first present an algorithm for checking intrinsic inconsistency by means of SPARQL ASK queries and then a safe rewriting algorithm. This rewriting is based on an observation that clashing triples can be introduced by a combination of two bindings of variables in the WHERE clause, as the example in the Sect. 1 (the ABox \(\mathcal {A}_1\)) illustrates. To handle such cases, two copies of the WHERE clause \(P_w\) are created by the rewriting in Algorithms 1 and 2, for each pair of disjoint concepts according to the TBox of the triple store. These algorithms use notation described in Remark 2 below.

figure c

Remark 2

Our rewriting algorithms rely on producing fresh copies of the WHERE clause. Assume \(\theta \), \(\theta _1\), \(\theta _2\), ... to be substitutions replacing each variable in a given formula with a distinct fresh one. For a substitution \(\sigma \), we also define \(\theta [\sigma ]\) resp. \(\theta _i[\sigma ]\) to be an extension of \(\sigma \), renaming each variable at positions not affected by \(\sigma \) with a distinct fresh one. For instance, let F be a triple (?Z :studentOf ?Y). Now, \(F\theta \) makes a variable disjoint copy of F: \(?Z_1\) :studentOf \(?Y_1\) for fresh \(?Z_1, ?Y_1\). \(F[?Z \mapsto ?X]\) is just a substitution of ?Z by ?X in F. Finally, \(F\theta [?Z \mapsto ?X]\) results in ?X :studentOf \(?Y_2\) for fresh \(?Y_2\). We assume that all occurrences of \(F\theta [\sigma ]\) stand for syntactically the same query, but that \(F\theta [\sigma _1]\) and \(F\theta [\sigma _2]\), for distinct \(\sigma _1\) and \(\sigma _2\), can only have variables in \(range(\sigma _1) \cap range(\sigma _2)\) in common. That is, the choice of fresh variables is defined by the parameterizing substitution \(\sigma \).    \(\blacksquare \)

Using this notation, the possibility of unifying two variables ?X and ?Y in \(P_w\) on a given triple store can be tested with the query \(\{P_w\theta _1[?X \mapsto ?Z]\} \{ P_w\theta _2[?Y \mapsto ?Z]\}\) where \(\theta _1\) and \(\theta _2\) are variable renamings as in Remark 2 and ?Z is a fresh variable.

In order to check the intrinsic consistency of an update, this condition should be evaluated for every pair of variables of \(P_w\), the unification of which leads to a clash. A SPARQL ASK query based on this idea is produced by Algorithm 1. Note that it suffices to check only triples of the form \(\{?X~\textsf {a}~ ?C\}\) at line 5 of Algorithm 1, since disjointness conditions can only be formulated for concepts, according to the syntax in Table 1. Furthermore, since we are taking the facts in \(P_i^\mathrm {eff}\) extended by all facts implied by \(\mathcal {T}\), at line 6 of Algorithm 1 it suffices to check the disjointness conditions explicitly mentioned in \(\mathcal {T}\) and not all those which are implied by \(\mathcal {T}\). Note also that the DELETE clause \(P_d\) plays no role in this case, since we only consider clashes within inserted facts.

figure d

Example 1

Consider the update u from Sect. 1, in which the INSERT clause \(P_i\) can create clashing triples. To identify potential clashes, Sect. 1 first applies the inference rule for the range constraint, and computes \(P^\mathrm {eff}_i = \){?X a :Student . ?Y a :Professor \(\}\). Now both variables ?X, ?Y occur in the triples of type (6) from Sect. 1 with clashing concept names. The following ASK query is produced by Sect. 1.

figure e

(In this and subsequent examples we omit the trivial \(\,\mathsf {FILTER}( False )\) union branch used in rewritings to initialize variables with disjunctive conditions, such as W in Algorithm 1)    \(\blacksquare \)

Suppose that an insert is not intrinsically consistent for a given triple store. One solution would be to discard it completely, should the above ASK query return \( True \). Another option which we consider here is to only discard those variable bindings from the WHERE clause, which make the INSERT clause \(P_i\) inconsistent. This is the task of the safe rewriting \({{\mathrm{safe}}}(\cdot )\) in Algorithm 2, removing all variable bindings that participate in a clash between different triples of \(P_i\). Let \(P_w\) be a WHERE clause, in which the variables ?X and ?Y should not be unified to avoid clashes. With \(\theta _1\), \(\theta _2\) being “fresh” variable renamings as in Remark 2, Algorithm 2 uses the union of \(P_w\theta _1[?X\mapsto ?Y]\) and \(P_w\theta _2[?Y\mapsto ?X]\) to eliminate unsafe bindings that send ?X and ?Y to the same value.

Example 2

Algorithm 2 extends the WHERE clause of the update u from Sect. 1 as follows:

figure f

Note that the safe rewriting can make the update void. For instance, \({{\mathrm{safe}}}(u)\) has no effect on the ABox \(\mathcal {A}_1\) from Sect. 1, since there is no cue, which of :jim :attendsClassOf :ann, :ann :attendsClassOf :jim needs to be dismissed to avoid the clash. However, if we extend this ABox with assertions both satisfying the WHERE clause of u and not causing undesirable variable unifications, \({{\mathrm{safe}}}(u)\) would make insertions based on such bindings. For instance, adding the fact :bob :attendsClassOf :alice to \(\mathcal {A}_1\) would assert :bob :studentOf :alice as a result of \({{\mathrm{safe}}}(u)\).    \(\blacksquare \)

A rationale for using \(\,\mathsf {MINUS}\,\) rather than \(\,\mathsf {FILTER}\ \mathsf {NOT}\,\mathsf {EXISTS}\,\) in Algorithm 2 (and also in a rewriting in forthcoming Sect. 4) can be illustrated by an update in which variables in the INSERT and DELETE clauses are bound in different branches of a UNION:

figure g

A safe rewriting of this update (abbreviating :attendsClassOf as :aCo) is

figure h

It can be verified that with \(\,\mathsf {FILTER}\ \mathsf {NOT}\,\mathsf {EXISTS}\,\) in place of \(\,\mathsf {MINUS}\,\) this update makes no insertions on all triple stores: the branches {?V1 :aCo ?W1} and {?V2 :aCo ?W2} are satisfied whenever {?X :aCo ?Y} is, making \(\,\mathsf {FILTER}\ \mathsf {NOT}\,\mathsf {EXISTS}\,\) evaluate to \( False \) whenever {?X :aCo ?Y} holds.

We conclude this section by formalizing the intuition of update safety. For a triple store G and an update \(u = (P_d, P_i, P_w)\), let \(\llbracket P_w \rrbracket ^u_G\) denote the set of variable bindings computed by the query “\(\,\mathsf {SELECT}?X_1, \ldots , ?X_k \,\mathsf {WHERE}\ P_w\)” over G, where \(?X_1, \ldots , ?X_k\) are the variables occurring in \(P_i\) or in \(P_d\).

Theorem 1

Let \(\mathcal {T}\) be a TBox, let u be a SPARQL update \((P_i, P_d, P_w)\), and let query \(q_u\) and update \({{\mathrm{safe}}}(u) = (P_d, P_i, P'_w)\) result from applying Algorithm 1 resp. Algorithm 2 to u and \(\mathcal {T}\). Then, the following properties hold for an arbitrary RDFS\(_{\lnot }\) triple store \(G = \mathcal {T}\cup \mathcal {A}\):

  1. (1)

    \(q_u(G) = True \ iff \; \exists \mu , \mu ' \in \llbracket P_w \rrbracket ^u_G s.t.\ \mu (P_i) \wedge \mu '(P_i) \wedge \mathcal {T}\models \bot \);

  2. (2)

    \( \llbracket P_w \rrbracket ^u_G \setminus \llbracket P'_w \rrbracket ^u_G = \{ \mu \in \llbracket P_w \rrbracket ^u_G \mid \exists \mu ' \in \llbracket P_w \rrbracket ^u_G \text { s.t. } \mu (P_i) \wedge \mu '(P_i) \wedge \mathcal {T}\models \bot \}. \)

4 Materialization Preserving Update Semantics

In this section we discuss resolution of inconsistencies between triples already in the triple store and newly inserted triples. Our baseline requirement for each update semantics is formulated as the following property.

Definition 7

(Consistency-preserving). Let G be a triple store and \(u(P_d, P_i, P_w)\) an update. A materialization preserving update semantics \( Sem \) is called consistency preserving in RDFS\(_{\lnot }\) if the evaluation of update u, i.e., \(G^{ Sem }_{u(P_d,P_i,P_w)}\), results in a consistent triple store.

Our consistency preserving semantics are respectively called brave, cautious and fainthearted. The brave semantics always gives priority to newly inserted triples by discarding all pre-existing information that contradicts the update. The cautious semantics is exactly the opposite, discarding inserts that are inconsistent with facts already present in the triple store; i.e., the cautious semantics never deletes facts unless explicitly required by the DELETE clause of the SPARQL update. Finally, the fainthearted semantics executes the update partially, only performing insertions for those variable bindings which do not contradict existing knowledge (again, taking into account deletions).

All semantics rely upon incremental update semantics Sem \(^{ mat }_{2}\), introduced in Sect. 2, which we aim to extend to take into account class disjointness. Note that for the present section we assume updates to be intrinsically consistent, which can be checked or enforced beforehand in a preprocessing step by the safe rewriting discussed in Sect. 3. In this section, we lift our definition of update operation to include also updates \((P_d, P_i, P_w)\) with \(P_w\) produced by the safe rewriting Algorithm 2 from some update satisfying Definition 4. What remains to be defined is the handling of clashes between newly inserted triples and triples already present in the triple store.

The intuitions of our semantics for a SPARQL update \(u(P_d, P_i, P_w)\) in the context of an RDFS\(_{\lnot }\) TBox are as follows:

  • brave semantics Sem \(^{ mat }_{\textit{brave}}\): (i) delete all instantiations of \(P_d\) and their causes, plus all the non-deleted triples in G clashing with instantiations of triples in \(P_i\) to be inserted, again also including the causes of these triples; (ii) insert the instantiations of \(P_i\) plus all their effects.

  • cautious semantics Sem \(^{ mat }_{\textit{caut}}\): (i) delete all instantiations of \(P_d\) and their causes; (ii) insert all instantiations of \(P_i\) plus all their effects, unless they clash with some non-deleted triples in G: in this latter case, do not perform the update.

  • fainthearted semantics Sem \(^{ mat }_{\textit{faint}}\): (i) delete all instantiations of \(P_d\) and their causes; (ii) insert those instantiations of \(P_i\) (plus all their effects) which do not clash with non-deleted triples in G.

Remark 3

Note that Sem \(^{ mat }_{2}\) is not able to cope with so called “dangling” effects – that is, triples inserted at some point for the sake of materialization, whose causes have been subsequently deleted. As pointed out in [1], one way to deal with this issue is to combine Sem \(^{ mat }_{2}\) with marking of explicitly inserted triples. This approach was implemented as a semantics Sem \(^{ mat }_{1b}\) in [1], splitting the ABox \(\mathcal {A}\) into the explicit part \(\mathcal {A}_{ex}\) and the implicit part \(\mathcal {A}_{im} = \mathcal {A}\setminus \mathcal {A}_{ex}\). \(\mathcal {A}_{ex}\) can be maintained, e.g., in a separate RDF graph using a straightforward update rewriting. Now, deleting \(P_d\) would not only retract \(P_d^\mathrm {caus}\) from \(\mathcal {A}\), but also the triples in \({{\mathrm{chase}}}(P_d^\mathrm {caus},\mathcal {T}) \setminus {{\mathrm{chase}}}(\mathcal {A}_{ex} \setminus P_d^\mathrm {caus},\mathcal {T})\). That is, the effects of \(P_d^\mathrm {caus}\) are removed unless they can be derived from facts remaining in \(\mathcal {A}\) after enforcing the deletion \(P_d\). Such an aggressive removal of dangling triples can lead to counterintuitive behavior (cf. Example 9 in [1]), and requires maintaining the explicit ABox \(\mathcal {A}_{ex}\), which is why we opted to preserve dangling effects in our rewritings.

We will now describe implementations of the three semantics above via SPARQL rewritings, which can be shown to be materialization preserving and consistency preserving.

figure i

4.1 Brave Semantics

The rewriting in Algorithm 3 implements the brave update semantics Sem \(^{ mat }_{\textit{brave}}\); it can be viewed as combining the idea of FastEvol[5] with Sem \(^{ mat }_{2}\) to handle inconsistencies by giving priority to triples that ought to be inserted, and deleting all those triples from the store that clash with the new ones.

Example 3

Example 2 in Sect. 3 provided a safe rewriting \({{\mathrm{safe}}}(u)\) of the update u from Sect. 1. According to Algorithm 3, this safe update is rewritten to:

figure j

The DELETE clause removes potential clashes for the inserted triples. Note that also property assertions implying clashes need to be deleted, which introduces fresh variables ?X1 and ?Y1. These variables have to be bound in the WHERE clause, and therefore \(P_d^\textit{fvars}\) adds two optional clauses to the WHERE clause, which is a computationally reasonable implementation of the concept \(P^\textit{fvars}\) from Definition 5.    \(\blacksquare \)

The DELETE clause \(P'_d\) of the rewritten update is initialized in Algorithm 3 with the set \(P_d\) of triples from the input update. Rewriting ensures that also all “causes” of deleted facts are removed from the store, since otherwise the materialization will re-insert deleted triples. To this end, line 1 of Algorithm 3 adds to \(P'_d\) all facts from which \(P_d\) can be derived. Then, for each triple implied by \(P_i\) (that is, for each triple in \(P_i^\mathrm {eff}\)) the algorithm computes the patterns of clashing triples and adds them to the DELETE clause \(P'_d\), along with their causes. Note that it suffices to only consider disjointness assertions that are syntactically contained in \(\mathcal {T}\) (and not those implied by \(\mathcal {T}\)), since we assume that the store G is materialized. Finally, the WHERE clause of the rewritten update is extended to satisfy the syntactic restriction that all variables in \(P'_d\) must be bound: bindings of “fresh” variables introduced to \(P'_d\) due to the domain or range constraints in \(\mathcal {T}\) are provided by the part \(P_d^\textit{fvars}\), cf. Definition 5 and Example 3. The rewritten update is evaluated over the triple store, computing its new materialized and consistent state.

In the RDFS\(_{\lnot }\) ontology language and under the restriction that only ABox updates are allowed, the brave semantics is a belief revision operator [10, 20], performing a minimal change of the RDF graph (which due to materialization can be seen both as a deductive closure of the formula representing the ABox as well as the minimal model of this formula). There is a unique way of resolving inconsistencies since the only deduction rule with more than one ABox assertion in the premise, is the clash due to class disjointness (Fig. 1): assuming intrinsic consistency, the choice of which class membership assertion to remove in order to avoid clash is univocal (new knowledge is always preferred).

Theorem 2

Algorithm 3, given a SPARQL update u and a consistent materialized triple store \(G = \mathcal {T}\cup \mathcal {A}\), computes a new consistent and materialized state w.r.t. brave semantics. The rewriting in lines 1–6 takes time polynomial in the size of u and \(\mathcal {T}\).

4.2 Cautious Semantics

Unlike Sem \(^{ mat }_{\textit{brave}}\), its cautious version Sem \(^{ mat }_{\textit{caut}}\) always gives priority to triples that are already present in the triple store, and dismisses any inserts that are inconsistent with it. We implement this semantics as follows: (i) the DELETE command does not generate inconsistencies and thus is assumed to be always possible; (ii) the update is actually executed only if the triples introduced by the INSERT clause do not clash with state of the triple graph after all deletions have been applied.

figure k

Cautious semantics thus treats insertions and deletions asymmetrically: the former depend on the latter but not the other way round. The rationale is that deletions never cause inconsistencies and can remove clashes between the old and the new data.

As in the case of brave semantics, cautious semantics is implemented using rewriting, presented in Algorithm 4. First, the algorithm issues an ASK query to check that no clashes will be generated by the INSERT clause, provided that the DELETE part of the update is executed. If no clashes are expected, in which case the ASK query returns \( False \), the brave update from the previous section is applied.

For a safe update \(u = (P_d,P_i,P_w)\), the ASK query is generated as follows. For each triple pattern \(\{?X ~\textsf {a}~ C\}\) among the effects of \(P_i\), at line 3 Algorithm 4 enumerates all concepts \(C'\) that are explicitly mentioned as disjoint with C in \(\mathcal {T}\). As in the case of brave semantics, this syntactic check is sufficient due to the assumption that the update is applied to a materialized store; by the same reason also no property assertions need to be taken into account.

For each concept \(C'\) disjoint with C, we need to check that a triple matching the pattern \(\{?X ~\textsf {a}~ C'\}\) is in the store G and will not be deleted by u. Deletion happens if there is a pattern \(\{?Y~\textsf {a}~C'\} \in P_d^\mathrm {caus}\) such that the variable ?Y can be bound to the same value as ?X in the WHERE clause \(P_w\). Line 6 of Algorithm 4 produces such a check, using a copy of \(P_w\), in which the variable ?Y is replaced by ?X and all other variables are replaced with distinct fresh ones. Since there can be several such triple patterns in \(P_d^\mathrm {caus}\), testing for clash elimination via the DELETE clause requires a disjunctive graph pattern \(\varTheta ^-_{C'}\) constructed at line 6 and combined with \(\{?X\, \textsf {a}\, C'\}\) using \(\,\mathsf {MINUS}\,\) at line 7.

Finally, the resulting pattern is appended to the list W of clash checks using \(\,\mathsf {UNION}\,\). As a result, \(\{P_w\}.\{W\}\) queries for triples that are not deleted by u and clash with an instantiation of some class membership assertion \(\{?X ~\textsf {a}~C\} \in P_i^\mathrm {eff}\).

Theorem 3

Algorithm 4, given a SPARQL update u and a consistent materialized triple store \(G = \mathcal {T}\cup \mathcal {A}\), computes a new consistent and materialized state w.r.t. cautious semantics. The rewriting in lines 1–8 takes time polynomial in the size of u and \(\mathcal {T}\).

Example 4

Algorithm 4 rewrites the safe update \({{\mathrm{safe}}}(u)\) from Example 2 as follows:

figure l

Now, consider an update \(u'\) having both INSERT and DELETE clauses:

figure m

The update \(u'\) inserts a single class membership fact and thus is always intrinsically consistent. The ASK query in Algorithm 4 takes the DELETE clause of \(u'\) into account:

figure n

   \(\blacksquare \)

4.3 Fainthearted Semantics

Our third, fainthearted semantics is meant to take an intermediate position between the cautious semantics and the brave one. A shortcoming of the cautious semantics is that massive update can be retracted because of only a few clashing triples. Not to discard an update completely in such a case, the user can decide either to override the existing knowledge — that is, opt for the brave semantics — or to apply insertions only for those variable bindings which are not clashing with the existing state, which is what the fainthearted semantics does.

figure o

Our realization of the idea of accommodating non-clashing inserts is based on decoupling the insert and the delete part of an update: whereas the delete is executed for all variable bindings satisfying the WHERE clause, one dismisses the inserts for variable bindings that yield clashes with the state of the store after the delete. That is, we deviate from the notion of update as an atomic operation in a different way than in the safe rewriting where both deletions and insertions are dismissed for variable bindings leading to clashes. Our motivation for such a design decision is explained next.

Assume that for each variable binding \(\mu \) returned by the WHERE pattern, we want to either insert \({{\mathrm{gr}}}(P_i\mu )\) along with deleting \({{\mathrm{gr}}}(P_d\mu )\), or dismiss \(\mu \) altogether. As an example, consider the update \(u'\) from Example 4 and the ABox \(\{\) :jim :attendsClassOf :ann. :jim a :Professor. :bob :attendsClassOf :jim \(\}\). With the variable binding \(\mu _1 = [?X\mapsto \mathtt {{:}jim}, ?Y \mapsto \mathtt {{:}ann}]\) we insert :jim a :Student knowing that the clashing fact :jim a :Professor will be deleted by the binding \(\mu _2 = [?X\mapsto \mathtt {{:}bob}, ?Y \mapsto \mathtt {{:}jim}]\). However, if the update is atomic, this anticipated deletion will only happen if \({{\mathrm{gr}}}(P_i\mu _2)\) does not introduce clashes. Assume this is the case (i.e. also {:bob a :Professor} is in the ABox): we have to look one more step ahead and check if this triple will be deleted by some variable binding \(\mu _3\), and so on. This behaviour could be realized with SPARQL path expressions, which would however stipulate severe syntactic restrictions on the WHERE clause \(P_w\) of the original update.

As mentioned above, our interpretation of fainthearted semantics assumes independence between the INSERT and DELETE parts of the update. To implement this, we rely on SPARQL’s flexible handling of variable bindings. Namely, we rename the variables in the DELETE clause apart from the rest of the update, and put this renamed apart copy of the WHERE clause in a new UNION branch. The original WHERE clause is then rewritten (using MINUS operator, similarly to the case of cautious semantics) to ensure that insertions are only done for variable bindings where clashes are removed by the DELETE clause with some variable binding. The implementation can be found in Algorithm 5.

Example 5

The update \(u'\) from Example 4 is rewritten as follows by Algorithm 5:

figure p

The first union branch binds the variables in the DELETE clause (both using fresh variables). The second branch binds the variable ?X in the INSERT clause, using MINUS to remove variable bindings for which a non-deleted clash exists. The test that a clash will not be deleted is expressed using the inner MINUS operator.    \(\blacksquare \)

We conclude with a claim of correctness and polynomial complexity of the rewriting, similar to those made for the brave and cautious semantics.

Theorem 4

Algorithm 5, given a SPARQL update u and a materialized triple store \(G = \mathcal {T}\cup \mathcal {A}\) w.r.t. fainthearted semantics, computes a new consistent and materialized state. The rewriting in lines 1–9 takes time polynomial in the size of u and \(\mathcal {T}\).

5 Experimental Evaluation

For each of the three semantics discussed in the previous section, we provided a preliminary implementation using the Jena API (http://jena.apache.org) and evaluated them against Jena TDB triple store which implements the latest SPARQL 1.1 specification. As before, for computing the initial materialization of a triple store mat(G) we rely on-board, forward-chaining materialization in Jena TDB using the minimal RDFS rules as in Fig. 1.

For our experiments, we used the data generated by the EUGen generator [15] of for the size range of 5 to 50 Universities. We opted for using this generator as it extends the LUBM ontology [9] with chains of subclasses, making the rewritings more challenging. In our case we have used the default of \(i=20\) subclasses for each LUBM concept (e.g., Subj i Students) and made such subclasses pairwise disjoint. Moreover, we have added more disjointness axioms where appropriate, e.g., :AssociateProfessor \(\textsf {dw}\) :FullProfessor. All these TBox axioms are merged with our previous reduced RDFS version of LUBM used in our previous work [1]. To compare the experimental results with the previous work, for our experiments we adapted the seven updates from [1]. Our prototype, as well as files containing the data, ontology, and the updates used for experiments, are made available on a dedicated Web pageFootnote 1.

The results summarized in Table 2 show that the LUBM 50 dataset (507 MB uncompressed, 8.7 M triples after materialization) can be handled in seconds on a quad-core Intel i7 3.20 GHz machine with 16 GB RAM. For each of the three semantics, we have compared the time elapsed for rewriting and for the evaluation of the resulting update. The last line in Table 2 is the evaluation time for the original, non-rewritten update. One can notice that brave semantics Sem \(^{ mat }_{\textit{brave}}\) is often the most expensive one, since it performs most modifications. When the number of inconsistent inserts is low though, the situation is different, and the brave semantics slightly outperforms the fainthearted semantics Sem \(^{ mat }_{\textit{faint}}\) (Update #6 and #7), due to the more complex checks in the WHERE clause produced by Algorithm 5. For the cautious semantics Sem \(^{ mat }_{\textit{caut}}\), the numbers in the table are construction and evaluation time of the ASK query checking for the feasibility of update (cf. Algorithm 4). In case this ASK query returns False, the runtime of brave semantics should be added in order to obtain the total runtime of the update. Update #4 demonstrates that Sem \(^{ mat }_{\textit{caut}}\) can perform significantly worse than Sem \(^{ mat }_{\textit{faint}}\) when the number of instantiations in the original WHERE clause is high. This is because the ASK query in Sem \(^{ mat }_{\textit{caut}}\) looks for instantiations of the WHERE clause which can lead to clashes with the existing tuples (using a conjunctive condition), whereas Sem \(^{ mat }_{\textit{faint}}\) reduces the set of solutions of the original WHERE clause using MINUS, which is apparently more efficient in the Apache TDB.

Table 2. Evaluation results in seconds for LUBM 50

6 Related Work and Conclusions

In this paper we have taken a step further from our previous work, in combining SPARQL Update and RDFS entailment by adding concept disjoints as a first step towards dealing with inconsistencies in the context of SPARQL Updates. We distinguish the case of intrinsic inconsistency, localized within instantiations of the INSERT clause of a SPARQL update, and the usual case when the new information is inconsistent with the old knowledge. In the former case, our solution was to discard all solutions of the WHERE query that participate in an inconsistency. For the latter case, we discussed several reconciliation strategies, well suited for efficient implementation in SPARQL. Our preliminary implementation shows the feasibility of all proposed approaches on top of an off-the-shelf triple store supporting SPARQL and SPARQL update (Apache TDB).

The problem of knowledge based update and belief revision has been extensively studied in the literature, although not in the context of SPARQL updates where facts to be deleted or inserted come from a query. As argued in Sect. 4.1, brave semantics implements the most established approach of adapting the new information fully via a minimal change, which is feasible in the setting of fixed RDFS\(_{\lnot }\) TBoxes. Also semantics deliberating between accepting and discarding change are known (see [10] for a survey). In [18] an approach involving user interaction to decide whether to accept or reject an individual axiom is considered, with some part of the update being computed automatically in order to ensure its consistency. We do not consider interactive procedures here (although they clearly make sense in the case of more complex TBoxes or for TBox updates). Instead, we rely on the resolution strategies which are simple for the user to understand and can be efficiently encoded in SPARQL. In a practical KB editing system, one should probably combine the two approaches, e.g. for resolving the intrinsic inconsistency. Likewise, the approaches [3, 7, 13] consider grounded updates only, whereas our focus is on implementation of updates in SPARQL. The approach in [7] captures RDFS and several additional types of constraints and is close in spirit to our brave semantics.

Intrinsic consistency of an update is a common assumption in knowledge base update (e.g. [57, 14]), which can be easily violated in the case of SPARQL updates. It is worth noting that our resolution strategy for intrinsic inconsistency called safe rewriting can be combined with all three update semantics using just the basic SPARQL operators.

Much interesting work remains to be done in order to optimize rewritten updates. Moreover, we plan to further extend our work towards increasing coverage of more expressive logics and OWL profiles, namely additional axioms from OWL2 RL or OWL 2 QL [16].