Ontology evolution without tears
Introduction
The development of new scientific techniques and the emergence of new high throughput tools have led to a new information revolution. The nature and the amount of information now available open directions of research that were once in the realm of science fiction. During this information revolution the data gathering capabilities have greatly surpassed the data analysis techniques, making the task to fully analyze the data at the speed at which it is collected a challenge. The amount, diversity, and heterogeneity of that information have led to the adoption of data integration systems in order to manage it and further process it. However, the integration of these disparate data sources raises several semantic heterogeneity problems.
By accepting an ontology as a point of common reference, naming conflicts are eliminated and semantic conflicts are reduced. Ontologies are used to identify and resolve heterogeneity problems, usually at schema level, as a means for establishing an explicit formal vocabulary to share. During the past years, ontologies have been used as global schemata in database integration [1], obtaining promising results, for example in the fields of biomedicine and bioinformatics [2], [3]. When using ontologies to integrate data, one is required to produce mappings, to link similar concepts or relationships from the ontology/ies to the sources by way of an equivalence. This is the mapping definition process [4] and the output of this task is the mapping, i.e., a collection of mappings rules. In practice, this process is done manually with the help of graphical user interfaces and it is a time-consuming, labor-intensive and error-prone activity [5].
Despite the great amount of work done in ontology-based data integration, an important problem that most of the systems tend to ignore is that ontologies are living artifacts and subject to change [4]. Due to the rapid development of research, ontologies are frequently changed to depict the new knowledge that is acquired. The problem that occurs is the following: when ontologies change, the mappings may become invalid and should somehow be updated or adapted.
In this paper, we address the problem of data integration for evolving RDF/S ontologies that are used as global schemata. We address the problem for a core subset of SPARQL queries that correspond to a union of conjunctive queries. We argue that ontology change should be considered when designing ontology-based data integration systems. A typical solution would be to regenerate the mappings and then regenerate the dependent artifacts each time the ontology evolves. However, as this evolution might happen too often, the overhead of redefining the mappings each time is significant. The approach, to recreate mappings from scratch each time the ontology evolves, is widely recognized to be problematic [5], [6], [7], and instead, previously captured information should be reused. However, all current approaches that try to do that suffer from several drawbacks and are inefficient [8], [9] in handling ontology evolution in a state of the art ontology-based data integration system. The lack of an ideal approach leads us to propose a new mechanism that builds on the latest theoretical advances on the areas of ontology change [10] and query rewriting [11], [12] and incorporates and handles ontology evolution efficiently and effectively. More specifically:
- •
We present the architecture of a data integration system, named Evolving Data Integration system, that allows the evolution of the ontology used as global schema. Query answering in our system proceeds in two phases: (a) query rewriting from the latest to the earlier ontology versions and (b) query rewriting from one ontology version to the local schemata. Since query rewriting to the local schemata has been extensively studied [11], [12], [13], we focus on a layer above and deal only with the query rewriting between ontology versions.
- •
The query processing in the first step consists of: (i) query expansion that considers constraints coming from the ontology, and (ii) valid query rewriting that uses the changes between two ontology versions to produce rewritings among them.
- •
In order to identify the changes between the ontology versions we adopt a high-level language of changes. We show that the proposed language possesses salient properties such as uniqueness, inversibility and composability. Uniqueness is a pre-requisite for the solution described in this paper, where the other two properties are nice to have, but they are not necessary for our solution. The sequence of changes between the latest and the other ontology versions is produced automatically at setup time and then those changes are translated into logical GAV mappings. This translation enables query rewriting by unfolding. Moreover, the inversibility is exploited to rewrite queries from past ontology versions to the current, and vice versa, and composability to avoid the reconstruction of all sequences of changes among the latest and all previous ontology versions.
- •
Despite the fact that query rewriting always terminates, the rewritten queries issued, using past ontology versions, might fail. We show that this problem is not inhibiting in our algorithms but a consequence of information unavailability among ontology versions. To tackle this problem, we propose two solutions: (a) either to provide best “over-approximations” by means of minimally-containing and minimally-generalized queries, or (b) to provide insights for the failure by means of affecting change operations, thus driving query redefinition.
- •
We show that our method is sound and complete and does not impose a significant overhead. Finally, we present our experimental analysis using two real-world ontologies. Experiments performed show the feasibility of our approach and the considerable advantages gained.
Such a mechanism, that provides rewritings among data integration systems that use different ontology versions as global schemata, is flexible, modular and scalable. It can be used on top of any data integration system—independently of the family of the mappings that each specific data integration system uses to define mappings between one ontology version and the local schemata (GAV, LAV, GLAV [13]). New mappings or ontology versions can be easily and independently introduced without affecting other mappings or other ontology versions. Our engine takes the responsibility of assembling a coherent view of the world out of each specific setting.
This paper is an extended and revised version of a previously published conference paper [14] whereas the implemented system was demonstrated in [15]. However, only the basic ideas were described in [1], without a detailed analysis of the theoretical foundation of the approach. This manuscript adds to the previously published results, the related work, the formal properties of the language of changes used to capture ontology evolution and the specific semantics of the implemented architecture. In addition, the new algorithms that were created are presented, their correctness is proved and their complexity is analyzed. Finally, an evaluation of the system is presented for the first time using real and synthetic set of queries, and a discussion is added to the conclusion of this paper.
The rest of the paper is organized as follows: Section 2 introduces the problem by an example and presents related work. Section 3 presents the architecture of our system and describes its components. Section 4 describes the semantics of such a system and Section 5 elaborates on the aforementioned query rewriting among ontology versions. Finally, Section 6 presents our experimental analysis and Section 7 provides a summary and an outlook for further research.
Section snippets
Motivating example and related work
Consider the example RDF/S ontology shown on the left of Fig. 1. This ontology is used as a point of common reference, describing persons and their contact points (“Cont.Point”). We also have two relational databases DB1 and DB2 mapped to that version of the ontology. Assume now that the ontology designer decides to move the domain of the “has_cont_point” property from the class “Actor” to the class “Person”, and to delete the property “gender”. Moreover, the “street” and the “city” properties
Evolving data integration
We conceive an Evolving Data Integration system as a collection of data integration systems, each one of them using a different ontology version as global schema. Therefore, we extend the traditional formalism from [13] and define an Evolving Data Integration system as: Definition 3.1 Evolving Data Integration System An Evolving Data Integration system is a tuple of the form where: is a version of the ontology used as global schema, is a set of local sources and is the mapping between and .
Semantics of an evolving data integration system
Now we will define semantics for an Evolving Data Integration system . Our approach is similar to [13] and is sketched in Fig. 3. We start by considering a local database for each (), i.e., a database that conforms to the local sources of . For example is a local database for () that conforms to the local sources and .
Now, based on , we shall specify the information content of the global schema . We call a database for a global database. However, since
Query processing
Queries to are posed in terms of the global schema . For querying, we adopt a core subset of the SPARQL language corresponding to a union of conjunctive queries [43]. We chose SPARQL since it is currently the standard query language for the semantic web and has become an official W3C recommendation. Essentially, SPARQL is a graph-matching language. Given a data source, a query consists of a pattern which is matched against, and the values obtained from this matching are processed to give
Implementation and evaluation
The approach described in this paper was implemented on our exelixis3 platform [15]. We developed the exelixis platform as a web page using PHP/JQuery/HTML for the presentation and Java/PHP for implementing the algorithms. The interface is shown on Fig. 11. Using our platform the user is able to load and visualize one version of an RDF ontology. The visualization is provided either through the jOWL4 API or the OWLSight5
Discussion and conclusion
In this paper, we argue that ontology evolution is a reality and data integration systems should be aware and ready to deal with that. To that direction, we presented a novel approach that allows query answering under evolving ontologies without mapping redefinition between each ontology version and the corresponding data sources.
Our architecture is based on a module that can be placed on top of any traditional ontology-based data integration system, enabling ontology evolution. It does so by
Acknowledgments
We would like to thank the reviewers for their valuable comments. This work has been supported by the eHealthMonitor and EURECA projects and has been partly funded by the European Commission under contracts FP7-287509 and FP7-288048.
References (50)
- et al.
Contextualizing ontologies
J. Web Sem.
(2004) - et al.
Ontologies and databases: the DL-lite approach
- L. Martin, A. Anguita, V. Maojo, E. Bonsma, A.I.D. Bucur, J. Vrijnsen, M. Brochhausen, C. Cocos, H. Stenzhorn, M....
- et al.
Analyzing the evolution of life science ontologies and mappings
- et al.
Ontology change: classification and survey
Knowl. Eng. Rev.
(2008) - et al.
Preserving mapping consistency under schema changes
VLDB J.
(2004) - et al.
Semantic adaptation of schema mappings when schemas evolve
- C.A. Curino, H.J. Moon, M. Ham, C. Zaniolo, The PRISM workwench: database schema evolution without tears, in: ICDE,...
- H. Kondylakis, G. Flouris, D. Plexousakis, Ontology & schema evolution in data integration: review and assessment, in:...
- et al.
Schema mapping evolution through composition and inversion
: a unified approach to ontologies and integrity constraints
Linking data to ontologies
J. Data Semantics X
Data integration: a theoretical perspective
Designing information-preserving mapping schemes for XML
Preserving XML queries during schema evolution
A versioning management model for ontology-based data warehouses
Temporal and versioning model for schema evolution in object-oriented databases
Data Knowl. Eng.
Introducing time into RDF
IEEE Trans. Knowl. Data Eng.
Tracking changes in RDF(S) repositories
Cited by (48)
A Hybrid Storage Strategy to Manage the Evolution of an OWL 2 DL Domain Ontology
2017, Procedia Computer ScienceMethods of managing the evolution of ontologies and their alignments
2023, Applied IntelligenceUpdating the Result Ontology Integration at the Concept Level in the Event of the Evolution of Their Components
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)HInT: Hybrid and Incremental Type Discovery for Large RDF Data Sources
2021, ACM International Conference Proceeding SeriesServices for Connecting and Integrating Big Numbers of Linked Datasets
2021, Services for Connecting and Integrating Big Numbers of Linked DatasetsA workflow for supporting the evolution requirements of RDF-based semantic warehouses
2021, International Journal of Metadata, Semantics and Ontologies