Skip to main content
Log in

Defining and Detecting Complex Changes on RDF(S) Knowledge Bases

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

The dynamic nature of web data brings forward the need for maintaining data versions as well as identifying changes between them. In this paper, we deal with problems regarding understanding evolution, focusing on RDF(S) knowledge bases, as RDF is a de-facto standard for representing data on the web. We argue that revisiting past snapshots or the differences between them is not enough for understanding how and why data evolved. Instead, changes should be treated as first-class citizens. In our view, this involves supporting semantically rich, user-defined changes, called complex changes, as well as identifying the relations between them. In this paper, we present our perspective regarding complex changes, formally define a declarative language for defining complex changes on RDF(S) knowledge bases and present how this language is used to detect complex change instances among dataset versions, which can be queried for analyzing evolution. The approach has been extensively evaluated in terms of language expressivity and detection performance on both artificial and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and material

DBpedia datasets analyzed during this study are publicly available, while the artificial datasets can be generated by the EvoGen tool (GNU General Public License).

Code availability

The software implemented during this study is not publicly available.

Notes

  1. https://www.w3.org/TR/rdf11-concepts/

  2. https://www.w3.org/TR/rdf-schema/

  3. https://www.dbpedia.org/

  4. https://www.w3.org/TR/sparql11-query/

  5. https://javacc.org/

  6. https://virtuoso.openlinksw.com/

  7. https://github.com/mmeimaris/EvoGen

  8. https://wiki.dbpedia.org/

  9. https://www.w3.org/TR/2012/REC-owl2-overview-20121211/

  10. http://swat.cse.lehigh.edu/onto/univ-bench.owl

  11. https://www.w3.org/TR/owl-features/

References

  1. Antoniazzi F, Viola F (2018) RDF graph visualization tools: a survey. In: 23rd conference of open innovations association (FRUCT).

  2. Auer S, Herre H (2007) A versioning and evolution framework for RDF knowledge bases. In: Perspectives of systems informatics

  3. Berners-Lee Τ, Connolly D (2004) Delta: an ontology for the distribution of differences between RDF graphs. http://www.w3.org/DesignIssues/Diff (version: 2006-05-12)

  4. Bobed C, Maillot P, Cellier P, Ferré S (2020) Data-driven assessment of structural evolution of RDF graphs. Semantic Web 11(5):831–853

    Article  Google Scholar 

  5. Franconi E, Meyer T, Varzinczak I (2010) Semantic diff as the basis for knowledge base versioning. In: NMR.

  6. Galani T, Papastefanatos G, Stavrakas Y (2016) A language for defining and detecting interrelated complex changes on RDF(S) knowledge bases. In: ICEIS

  7. Galani T, Stavrakas Y, Papastefanatos G, Flouris G (2015) Supporting complex changes in RDF(S) knowledge bases. In: MEPDaW-15

  8. Gonzalez L, Hogan A (2018) Modeling dynamics in semantic web knowledge graphs with formal concept analysis. In: WWW

  9. Harris S, Seaborne A (2013) SPARQL query language for RDF. W3C recommendation. W3C.

  10. Kaminski M, Kostylev EV, Cuenca Grau B (2017) Query nesting, assignment, and aggregation in SPARQL 1.1. ACM TODS 42(3).

  11. Klein M (2004) Change management for distributed ontologies. Ph.D. thesis, Vrije University

  12. Maillot P, Bobed C (2018). Measuring structural similarity between RDF graphs. In: SIGAPP

  13. Meimaris M (2016) EvoGen: a generator for synthetic versioned RDF. In: EDBT/ICDT workshops.

  14. Meimaris M, Papastefanatos G (2016) The EvoGen benchmark suite for evolving RDF data. In: MEPDaW/LDQ in ESWC

  15. Noy NF, Musen M (2002) PromptDiff: a fixed-point algorithm for comparing ontology versions. In: AAAI

  16. Papastefanatos G, Stavrakas Y, Galani T (2013) Capturing the history and change structure of evolving data. In: DBKDA

  17. Papavasileiou V, Flouris G, Fundulaki I, Kotzinos D, Christophides V (2013) High-level change detection in RDF(S) KBs. ACM Trans Database Syst 38(1):1–42

    Article  MathSciNet  Google Scholar 

  18. Perez J, Arenas M, Gutierrez C (2009) Semantics and complexity of SPARQL. ACM TODS 34(3):1–45

    Article  Google Scholar 

  19. Plessers P, De Troyer O, Casteleyn S (2007) Understanding ontology evolution: a change detection approach. J Web Sem 5(1):39–49

    Article  Google Scholar 

  20. Roussakis Y, Chrysakis I, Stefanidis K, Flouris G, Stavrakas Y (2015) A flexible framework for understanding the dynamics of evolving RDF datasets. In: ISWC.

  21. Singh A, Brennan R, O’Sullivan D (2019) DELTA-LD: a change detection approach for linked datasets. In: MEPDAW in ESWC

  22. Stojanovic L (2004) Methods and tools for ontology evolution. Ph.D. thesis, University of Karlsruhe

  23. Troullinou G, Roussakis G, Kondylakis H, Stefanidis K, Flouris G (2016) Understanding ontology evolution beyond deltas. In: MEPDAW in EDBT/ICDT

  24. Volkel M, Winkler W, Sure Y, Kruk S, Synak M (2005) SemVersion: a versioning system for rdf and ontologies. In: ESWC.

  25. Guo Y, Pan Z, Heflin J (2005) LUBM: a benchmark for OWL knowledge base systems. J Web Semant 3(2–3):158–182

    Article  Google Scholar 

  26. Zeginis D, Tzitzikas Y, Christophides V (2011) On computing deltas of RDF/S knowledge bases. ACM Trans Web 5:1–36

    Article  Google Scholar 

Download references

Funding

This research is partially funded by the H2020 NEANIAS project (No.863448).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theodora Galani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix I: Simple Changes

Add_Type_Class(a)

Add object a of type rdfs:Class

Delete_Type_Class(a)

Delete object a of type rdfs:Class

Rename_Class(a)

Rename class a to b

Merge_Classes(A, b)

Merge classes contained in A into b

Merge_Classes_Into_Existing(A,b)

Merge classes in A into b, b ∈ A

Split_Class(a,B)

Split class a into classes contained in B

Split_Class_Into_Existing(a,B)

Split class a into classes in B, a ∈ B

Add_Type_Property(a)

Add object a of type rdf:property

Delete_Type_Property(a)

Delete object a of type rdf:property

Rename_Property(a,b)

Rename property a to b

Merge_Properties(A,b)

Merge properties contained in A into b

Merge_Properties_Into_Existing(A, b)

Merge A into b, b ∈ A

Split_Property(a,B)

Split property a into properties contained in B

Split_Property_Into_Existing(a,B)

Split a into properties in B, a ∈ B

Add_Type_Individual(a)

Add object a of type rdfs:Resource

Delete_Type_Individual(a)

Delete object a of type rdfs:Resource

Merge_Individuals(A,b)

Merge individuals contained in A into b

Merge_Individuals_Into_Existing(A,b)

Merge A into b, b ∈ A

Split_Individual(a,B)

Split individual a into individuals in B

Split_Individual_Into_Existing(a,B)

Split a into individuals in B, a ∈ B

Add_Superclass(a,b)

Parent b of class a is added

Delete_Superclass(a,b)

Parent b of class a is deleted

Add_Superproperty(a,b)

Parent b of property a is added

Delete_Superproperty(a,b)

Parent b of property a is deleted

Add_Type_To_Individual(a,b)

Type b of individual a is added

Delete_Type_From_Individual(a,b)

Type b of individual a is deleted

Add_Property_Instance (a1,a2,b)

Add property instance of property b

Delete_Property_Instance(a1,a2,b)

Delete instance of property b

Add_Domain(a,b)

Domain b of property a is added

Delete_Domain(a,b)

Domain b of property a is deleted

Add_Range(a,b)

Range b of property a is added

Delete_Range(a,b)

Range b of property a is deleted

Add_Comment(a,b)

Comment b of object a is added

Delete_Comment(a,b)

Comment b of object a is deleted

Change_Comment(u,a,b)

Change comment of resource u from a to b

Add_Label(a,b)

Label b of object a is added

Delete_Label(a,b)

Label b of object a is deleted

Change_Label(u,a,b)

Change label of resource u from a to b

Appendix 2: Complex Change Detection Correctness

Below we prove the correctness of the detection algorithm in Sect. 5 with respect to complex change language semantics. First, a subset of the proposed language is proven to have equivalent semantics to a subset of SPARQL. SPARQL semantics are defined in Perez et al. [18] and Kaminski et al. [10]. Next, augmenting with the rest features, semantics are implemented by applying Algorithm 3 to the result mappings of a SPARQL graph pattern.

Step 1. Consider the subset of the proposed complex change language which involves only changes with cardinalities one and "?", scalar parameters and filter expressions on scalar parameters. Complex change semantics are defined given a set of change instances \(I\) and SPARQL semantics given an RDF graph \(D\). Let \(D\) contain the RDF representation of \(I\) based on the vocabulary presented in Sect. 5.2.

(1) The abstract syntax of the proposed language is by definition equivalent to the one proposed for SPARQL in Perez et al. [18], assuming that a graph pattern involves triples for changes, except that: (a) UNION operator is not considered, (b) the right operand of OPT shall be a graph pattern corresponding to a primitive change pattern, or a filter primitive change pattern, or an optional change pattern involving only primitive change patterns, filter primitive change patterns or optional change patterns with these types of operands, (c) the right operand of OPT may be a triple that involves an optional variable \(xOPT\) (recall, if \(xOPT\in dom\left({\mu }_{c}\right)\) then \({\mu }_{c}\left(xOPT\right)=\varnothing \) or \({\mu }_{c}\left(xOPT\right)\ne \varnothing \)). All complex change language's built-in filter expressions are SPARQL built-in filter expressions as well. For a complete SPARQL feature list, see Harris and Seaborne [9].

(2) The semantics of the proposed language are by definition equal to SPARQL semantics as in Perez et al. [18] for the syntax in (1), since they are made up of semantically equivalent operators applied on equivalent data in the same sequence.

Algorithm 3 (grouping variables are the change variables) materializes the change instances, performing a trivial grouping, where each SPARQL result mapping forms a trivial group and a new complex change instance. Overall, \(\left[\kern-0.15em\left[ {change\; pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {graph\; pattern} \right]\kern-0.15em\right]_{D} } \right)\).

Step 2. Augment step 1 with set parameters. Consider a change pattern with a set variable \(X\) and a set of mappings \({\mu }_{c},\,{\Omega }_{c}.\) Since SPARQL does not support this feature, the graph pattern corresponding to the change pattern involves a scalar variable \(x\) corresponding to \(X\). Evaluating the graph pattern results in a set of mappings \(\mu,\,\Omega. \) It holds that \(dom\left({\mu }_{c}\right)-\left\{X\right\}=dom\left(\mu \right)-\left\{x\right\}\). Based on step 1, for each \({\mu }_{c}\in {\Omega }_{c}\) there is a \(\mu \in\Omega \) such that \({\mu }_{c}\left(y\right)=\mu \left(y\right)\) where \(y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\). Based on \({\mu }_{c}\) definition for a set parameter \({\mu }_{c}\left(X\right)={\cup }_{i=1, \dots , n}{\mu }_{i}\left(x\right)\), considering all \({\mu }_{i}\) where \({\mu }_{c}\left(y\right)={\mu }_{i}\left(y\right) \forall y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\) or simply \(\forall y\in dom\left({\mu }_{c}\right)-\left\{X\right\}\) and \(y\) is a change variable. Optional set variables are handled similarly. Therefore, the complex change semantics equal SPARQL semantics for step 1 plus Algorithm 3 for implementing set variable semantics: \(\left[\kern-0.15em\left[ {change\; pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {graph\; pattern} \right]\kern-0.15em\right]_{D} } \right)\).

Step 3. Augment step 2 with filter expressions on set parameters. These expressions are not SPARQL built-in expressions. Thus, each such expression \(R\) is mapped to an equivalent \({R}^{{\prime}}\) in SPARQL, based on built-in features (FILTER EXIST/NOT EXIST, MINUS and subqueries). The exact mapping for each one filter expression into SPARQL is not discussed in further detail. Also, \(R\) may combine primitive filter expressions with logical connectives. In this case, there is always an equivalent DNF expression \(DNF\left(R\right)={R}_{1}\vee {R}_{2}\vee \dots \vee {R}_{n}.\) Since, \(\left[\kern-0.15em\left[ {P \,FILTER \,R} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ {P } \right]\kern-0.15em\right]_{I}} \right|\mu { \vDash }R} \right\} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{1} \vee R_{2} \vee \ldots \vee R_{n} } \right\}\) and \(\left[\kern-0.15em\left[ {P \,FILTER \,R_1} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{1} } \right\}\,\ldots, \left[\kern-0.15em\left[ {P \,FILTER\, R_n} \right]\kern-0.15em\right]_{I} = \left\{ {\left. {\mu \in \left[\kern-0.15em\left[ P \right]\kern-0.15em\right]_{I} } \right|\mu { \vDash }R_{n} } \right\}\), it is implied that \(\left[\kern-0.15em\left[ {P\, FILTER\, R} \right]\kern-0.15em\right]_{I} = \left[\kern-0.15em\left[ {P \,FILTER\, R_1} \right]\kern-0.15em\right]_{I} \cup \ldots \cup \left[\kern-0.15em\left[ {P \,FILTER \,R}_n \right]\kern-0.15em\right]_{I}\). Thus, \(P\, FILTER\, R\) can be mapped in SPARQL to the union of all graph patterns where each comprises of P and Ri.

Overall, the complex change semantics are equal to the semantics of an equivalent SPARQL graph pattern plus Algorithm 3 for implementing the semantics of set variables (as in step 2). Again, \(\left[\kern-0.15em\left[ {change \;pattern} \right]\kern-0.15em\right]_{I} = Algorithm3\left( {\left[\kern-0.15em\left[ {equivalent \;graph \;pattern} \right]\kern-0.15em\right]_{D} } \right)\).

Step 4. Augment step 3 with cardinalities " + " and "*" and with union aggregation function. The change pattern is in extended form, including groups and aggregation. In Definition 12, a group \(\Gamma =Group\left({V}_{r}^{g}, P\right)\) is defined over a change pattern \(P\) and a list of variables \({V}_{r}^{g}\) (grouping variables). In Definition 13, an aggregate is a construct of the form \(A=Aggregate\left({v}_{r}, union,\Gamma \right)\) where \({v}_{r}\) is a variable over which \(union\) aggregate function is performed for each group \(\Gamma \). Based on previous steps, \(P\) is mapped to a SPARQL graph pattern \(P{^{\prime}}\), such that \(\left[\kern-0.15em\left[ P \right]\kern-0.15em\right] _I= Algorithm3\left( {\left[\kern-0.15em\left[ P^{\prime } \right]\kern-0.15em\right]_{D} } \right)\) (3). Groups and aggregation computation is based on variables in \({V}_{r}^{g}\), which is by definition a superset of the variables used by Algorithm 3 in (3), since in previous steps the grouping variables are the change variables. Thus, \(\left[\kern-0.15em\left[ A \right]\kern-0.15em\right]_I = Algorithm3\left( {\left[\kern-0.15em\left[ P^{\prime } \right]\kern-0.15em\right]_{D} } \right)\) and grouping variables are those in \({V}_{r}^{g}\). Union aggregation function is implemented by Algorithm 3, also implementing set variable semantics for computing set grouping variables.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galani, T., Papastefanatos, G., Stavrakas, Y. et al. Defining and Detecting Complex Changes on RDF(S) Knowledge Bases. J Data Semant 10, 367–398 (2021). https://doi.org/10.1007/s13740-021-00136-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-021-00136-9

Keywords

Navigation