Abstract
The dynamic nature of web data brings forward the need for maintaining data versions as well as identifying changes between them. In this paper, we deal with problems regarding understanding evolution, focusing on RDF(S) knowledge bases, as RDF is a de-facto standard for representing data on the web. We argue that revisiting past snapshots or the differences between them is not enough for understanding how and why data evolved. Instead, changes should be treated as first-class citizens. In our view, this involves supporting semantically rich, user-defined changes, called complex changes, as well as identifying the relations between them. In this paper, we present our perspective regarding complex changes, formally define a declarative language for defining complex changes on RDF(S) knowledge bases and present how this language is used to detect complex change instances among dataset versions, which can be queried for analyzing evolution. The approach has been extensively evaluated in terms of language expressivity and detection performance on both artificial and real data.
Similar content being viewed by others
Availability of data and material
DBpedia datasets analyzed during this study are publicly available, while the artificial datasets can be generated by the EvoGen tool (GNU General Public License).
Code availability
The software implemented during this study is not publicly available.
Notes
References
Antoniazzi F, Viola F (2018) RDF graph visualization tools: a survey. In: 23rd conference of open innovations association (FRUCT).
Auer S, Herre H (2007) A versioning and evolution framework for RDF knowledge bases. In: Perspectives of systems informatics
Berners-Lee Τ, Connolly D (2004) Delta: an ontology for the distribution of differences between RDF graphs. http://www.w3.org/DesignIssues/Diff (version: 2006-05-12)
Bobed C, Maillot P, Cellier P, Ferré S (2020) Data-driven assessment of structural evolution of RDF graphs. Semantic Web 11(5):831–853
Franconi E, Meyer T, Varzinczak I (2010) Semantic diff as the basis for knowledge base versioning. In: NMR.
Galani T, Papastefanatos G, Stavrakas Y (2016) A language for defining and detecting interrelated complex changes on RDF(S) knowledge bases. In: ICEIS
Galani T, Stavrakas Y, Papastefanatos G, Flouris G (2015) Supporting complex changes in RDF(S) knowledge bases. In: MEPDaW-15
Gonzalez L, Hogan A (2018) Modeling dynamics in semantic web knowledge graphs with formal concept analysis. In: WWW
Harris S, Seaborne A (2013) SPARQL query language for RDF. W3C recommendation. W3C.
Kaminski M, Kostylev EV, Cuenca Grau B (2017) Query nesting, assignment, and aggregation in SPARQL 1.1. ACM TODS 42(3).
Klein M (2004) Change management for distributed ontologies. Ph.D. thesis, Vrije University
Maillot P, Bobed C (2018). Measuring structural similarity between RDF graphs. In: SIGAPP
Meimaris M (2016) EvoGen: a generator for synthetic versioned RDF. In: EDBT/ICDT workshops.
Meimaris M, Papastefanatos G (2016) The EvoGen benchmark suite for evolving RDF data. In: MEPDaW/LDQ in ESWC
Noy NF, Musen M (2002) PromptDiff: a fixed-point algorithm for comparing ontology versions. In: AAAI
Papastefanatos G, Stavrakas Y, Galani T (2013) Capturing the history and change structure of evolving data. In: DBKDA
Papavasileiou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2013) High-level change detection in RDF(S) KBs. ACM Trans Database Syst 38(1):1–42
Perez J, Arenas M, Gutierrez C (2009) Semantics and complexity of SPARQL. ACM TODS 34(3):1–45
Plessers P, De Troyer O, Casteleyn S (2007) Understanding ontology evolution: a change detection approach. J Web Sem 5(1):39–49
Roussakis Y, Chrysakis I, Stefanidis K, Flouris G, Stavrakas Y (2015) A flexible framework for understanding the dynamics of evolving RDF datasets. In: ISWC.
Singh A, Brennan R, O’Sullivan D (2019) DELTA-LD: a change detection approach for linked datasets. In: MEPDAW in ESWC
Stojanovic L (2004) Methods and tools for ontology evolution. Ph.D. thesis, University of Karlsruhe
Troullinou G, Roussakis G, Kondylakis H, Stefanidis K, Flouris G (2016) Understanding ontology evolution beyond deltas. In: MEPDAW in EDBT/ICDT
Volkel M, Winkler W, Sure Y, Kruk S, Synak M (2005) SemVersion: a versioning system for rdf and ontologies. In: ESWC.
Guo Y, Pan Z, Heflin J (2005) LUBM: a benchmark for OWL knowledge base systems. J Web Semant 3(2–3):158–182
Zeginis D, Tzitzikas Y, Christophides V (2011) On computing deltas of RDF/S knowledge bases. ACM Trans Web 5:1–36
Funding
This research is partially funded by the H2020 NEANIAS project (No.863448).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix I: Simple Changes
Add_Type_Class(a) | Add object a of type rdfs:Class |
Delete_Type_Class(a) | Delete object a of type rdfs:Class |
Rename_Class(a) | Rename class a to b |
Merge_Classes(A, b) | Merge classes contained in A into b |
Merge_Classes_Into_Existing(A,b) | Merge classes in A into b, b ∈ A |
Split_Class(a,B) | Split class a into classes contained in B |
Split_Class_Into_Existing(a,B) | Split class a into classes in B, a ∈ B |
Add_Type_Property(a) | Add object a of type rdf:property |
Delete_Type_Property(a) | Delete object a of type rdf:property |
Rename_Property(a,b) | Rename property a to b |
Merge_Properties(A,b) | Merge properties contained in A into b |
Merge_Properties_Into_Existing(A, b) | Merge A into b, b ∈ A |
Split_Property(a,B) | Split property a into properties contained in B |
Split_Property_Into_Existing(a,B) | Split a into properties in B, a ∈ B |
Add_Type_Individual(a) | Add object a of type rdfs:Resource |
Delete_Type_Individual(a) | Delete object a of type rdfs:Resource |
Merge_Individuals(A,b) | Merge individuals contained in A into b |
Merge_Individuals_Into_Existing(A,b) | Merge A into b, b ∈ A |
Split_Individual(a,B) | Split individual a into individuals in B |
Split_Individual_Into_Existing(a,B) | Split a into individuals in B, a ∈ B |
Add_Superclass(a,b) | Parent b of class a is added |
Delete_Superclass(a,b) | Parent b of class a is deleted |
Add_Superproperty(a,b) | Parent b of property a is added |
Delete_Superproperty(a,b) | Parent b of property a is deleted |
Add_Type_To_Individual(a,b) | Type b of individual a is added |
Delete_Type_From_Individual(a,b) | Type b of individual a is deleted |
Add_Property_Instance (a1,a2,b) | Add property instance of property b |
Delete_Property_Instance(a1,a2,b) | Delete instance of property b |
Add_Domain(a,b) | Domain b of property a is added |
Delete_Domain(a,b) | Domain b of property a is deleted |
Add_Range(a,b) | Range b of property a is added |
Delete_Range(a,b) | Range b of property a is deleted |
Add_Comment(a,b) | Comment b of object a is added |
Delete_Comment(a,b) | Comment b of object a is deleted |
Change_Comment(u,a,b) | Change comment of resource u from a to b |
Add_Label(a,b) | Label b of object a is added |
Delete_Label(a,b) | Label b of object a is deleted |
Change_Label(u,a,b) | Change label of resource u from a to b |
Appendix 2: Complex Change Detection Correctness
Below we prove the correctness of the detection algorithm in Sect. 5 with respect to complex change language semantics. First, a subset of the proposed language is proven to have equivalent semantics to a subset of SPARQL. SPARQL semantics are defined in Perez et al. [18] and Kaminski et al. [10]. Next, augmenting with the rest features, semantics are implemented by applying Algorithm 3 to the result mappings of a SPARQL graph pattern.
Step 1. Consider the subset of the proposed complex change language which involves only changes with cardinalities one and "?", scalar parameters and filter expressions on scalar parameters. Complex change semantics are defined given a set of change instances \(I\) and SPARQL semantics given an RDF graph \(D\). Let \(D\) contain the RDF representation of \(I\) based on the vocabulary presented in Sect. 5.2.
(1) The abstract syntax of the proposed language is by definition equivalent to the one proposed for SPARQL in Perez et al. [18], assuming that a graph pattern involves triples for changes, except that: (a) UNION operator is not considered, (b) the right operand of OPT shall be a graph pattern corresponding to a primitive change pattern, or a filter primitive change pattern, or an optional change pattern involving only primitive change patterns, filter primitive change patterns or optional change patterns with these types of operands, (c) the right operand of OPT may be a triple that involves an optional variable \(xOPT\) (recall, if \(xOPT\in dom\left({\mu }_{c}\right)\) then \({\mu }_{c}\left(xOPT\right)=\varnothing \) or \({\mu }_{c}\left(xOPT\right)\ne \varnothing \)). All complex change language's built-in filter expressions are SPARQL built-in filter expressions as well. For a complete SPARQL feature list, see Harris and Seaborne [9].
(2) The semantics of the proposed language are by definition equal to SPARQL semantics as in Perez et al. [18] for the syntax in (1), since they are made up of semantically equivalent operators applied on equivalent data in the same sequence.
Algorithm 3 (grouping variables are the change variables) materializes the change instances, performing a trivial grouping, where each SPARQL result mapping forms a trivial group and a new complex change instance. Overall, \(\left[\kern-0.15em\left[ {change\; pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {graph\; pattern} \right]\kern-0.15em\right]_{D} } \right)\).
Step 2. Augment step 1 with set parameters. Consider a change pattern with a set variable \(X\) and a set of mappings \({\mu }_{c},\,{\Omega }_{c}.\) Since SPARQL does not support this feature, the graph pattern corresponding to the change pattern involves a scalar variable \(x\) corresponding to \(X\). Evaluating the graph pattern results in a set of mappings \(\mu,\,\Omega. \) It holds that \(dom\left({\mu }_{c}\right)-\left\{X\right\}=dom\left(\mu \right)-\left\{x\right\}\). Based on step 1, for each \({\mu }_{c}\in {\Omega }_{c}\) there is a \(\mu \in\Omega \) such that \({\mu }_{c}\left(y\right)=\mu \left(y\right)\) where \(y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\). Based on \({\mu }_{c}\) definition for a set parameter \({\mu }_{c}\left(X\right)={\cup }_{i=1, \dots , n}{\mu }_{i}\left(x\right)\), considering all \({\mu }_{i}\) where \({\mu }_{c}\left(y\right)={\mu }_{i}\left(y\right) \forall y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\) or simply \(\forall y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\) and \(y\) is a change variable. Optional set variables are handled similarly. Therefore, the complex change semantics equal SPARQL semantics for step 1 plus Algorithm 3 for implementing set variable semantics: \(\left[\kern-0.15em\left[ {change\; pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {graph\; pattern} \right]\kern-0.15em\right]_{D} } \right)\).
Step 3. Augment step 2 with filter expressions on set parameters. These expressions are not SPARQL built-in expressions. Thus, each such expression \(R\) is mapped to an equivalent \({R}^{{\prime}}\) in SPARQL, based on built-in features (FILTER EXIST/NOT EXIST, MINUS and subqueries). The exact mapping for each one filter expression into SPARQL is not discussed in further detail. Also, \(R\) may combine primitive filter expressions with logical connectives. In this case, there is always an equivalent DNF expression \(DNF\left(R\right)={R}_{1}\vee {R}_{2}\vee \dots \vee {R}_{n}.\) Since, \(\left[\kern-0.15em\left[ {P \,FILTER \,R} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ {P } \right]\kern-0.15em\right]_{I}} \right|\mu { \vDash }R} \right\} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{1} \vee R_{2} \vee \ldots \vee R_{n} } \right\}\) and \(\left[\kern-0.15em\left[ {P \,FILTER \,R_1} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{1} } \right\}\,\ldots, \left[\kern-0.15em\left[ {P \,FILTER\, R_n} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{n} } \right\}\), it is implied that \(\left[\kern-0.15em\left[ {P\, FILTER\, R} \right]\kern-0.15em\right]_{I} = \left[\kern-0.15em\left[ {P \,FILTER\, R_1} \right]\kern-0.15em\right]_{I} \cup \ldots \cup \left[\kern-0.15em\left[ {P \,FILTER \,R}_n \right]\kern-0.15em\right]_{I}\). Thus, \(P\, FILTER\, R\) can be mapped in SPARQL to the union of all graph patterns where each comprises of P and Ri.
Overall, the complex change semantics are equal to the semantics of an equivalent SPARQL graph pattern plus Algorithm 3 for implementing the semantics of set variables (as in step 2). Again, \(\left[\kern-0.15em\left[ {change \;pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {equivalent \;graph \;pattern} \right]\kern-0.15em\right]_{D} } \right)\).
Step 4. Augment step 3 with cardinalities " + " and "*" and with union aggregation function. The change pattern is in extended form, including groups and aggregation. In Definition 12, a group \(\Gamma =Group\left({V}_{r}^{g}, P\right)\) is defined over a change pattern \(P\) and a list of variables \({V}_{r}^{g}\) (grouping variables). In Definition 13, an aggregate is a construct of the form \(A=Aggregate\left({v}_{r}, union,\Gamma \right)\) where \({v}_{r}\) is a variable over which \(union\) aggregate function is performed for each group \(\Gamma \). Based on previous steps, \(P\) is mapped to a SPARQL graph pattern \(P{^{\prime}}\), such that \(\left[\kern-0.15em\left[ P \right]\kern-0.15em\right] _I= Algorithm3\left( {\left[\kern-0.15em\left[ P^{\prime } \right]\kern-0.15em\right]_{D} } \right)\) (3). Groups and aggregation computation is based on variables in \({V}_{r}^{g}\), which is by definition a superset of the variables used by Algorithm 3 in (3), since in previous steps the grouping variables are the change variables. Thus, \(\left[\kern-0.15em\left[ A \right]\kern-0.15em\right]_I = Algorithm3\left( {\left[\kern-0.15em\left[ P^{\prime } \right]\kern-0.15em\right]_{D} } \right)\) and grouping variables are those in \({V}_{r}^{g}\). Union aggregation function is implemented by Algorithm 3, also implementing set variable semantics for computing set grouping variables.
Rights and permissions
About this article
Cite this article
Galani, T., Papastefanatos, G., Stavrakas, Y. et al. Defining and Detecting Complex Changes on RDF(S) Knowledge Bases. J Data Semant 10, 367–398 (2021). https://doi.org/10.1007/s13740-021-00136-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-021-00136-9