Elsevier

Journal of Web Semantics

Volume 19, March 2013, Pages 42-58
Journal of Web Semantics

Ontology evolution without tears

https://doi.org/10.1016/j.websem.2013.01.001Get rights and content

Abstract

The evolution of ontologies is an undisputed necessity in ontology-based data integration. Yet, few research efforts have focused on addressing the need to reflect the evolution of ontologies used as global schemata onto the underlying data integration systems. In most of these approaches, when ontologies change their relations with the data sources, i.e., the mappings, are recreated manually, a process which is known to be error-prone and time-consuming. In this paper, we provide a solution that allows query answering in data integration systems under evolving ontologies without mapping redefinition. This is achieved by rewriting queries among ontology versions and then forwarding them to the underlying data integration systems to be answered. To this purpose, initially, we automatically detect and describe the changes among ontology versions using a high level language of changes. Those changes are interpreted as sound global-as-view (GAV) mappings, and they are used in order to produce equivalent rewritings among ontology versions. Whenever equivalent rewritings cannot be produced we a) guide query redefinition or b) provide the best “over-approximations”, i.e., the minimally-containing and minimally-generalized rewritings. We prove that our approach imposes only a small overhead over traditional query rewriting algorithms and it is modular and scalable. Finally, we show that it can greatly reduce human effort spent since continuous mapping redefinition is no longer necessary.

Introduction

The development of new scientific techniques and the emergence of new high throughput tools have led to a new information revolution. The nature and the amount of information now available open directions of research that were once in the realm of science fiction. During this information revolution the data gathering capabilities have greatly surpassed the data analysis techniques, making the task to fully analyze the data at the speed at which it is collected a challenge. The amount, diversity, and heterogeneity of that information have led to the adoption of data integration systems in order to manage it and further process it. However, the integration of these disparate data sources raises several semantic heterogeneity problems.

By accepting an ontology as a point of common reference, naming conflicts are eliminated and semantic conflicts are reduced. Ontologies are used to identify and resolve heterogeneity problems, usually at schema level, as a means for establishing an explicit formal vocabulary to share. During the past years, ontologies have been used as global schemata in database integration [1], obtaining promising results, for example in the fields of biomedicine and bioinformatics [2], [3]. When using ontologies to integrate data, one is required to produce mappings, to link similar concepts or relationships from the ontology/ies to the sources by way of an equivalence. This is the mapping definition process [4] and the output of this task is the mapping, i.e., a collection of mappings rules. In practice, this process is done manually with the help of graphical user interfaces and it is a time-consuming, labor-intensive and error-prone activity [5].

Despite the great amount of work done in ontology-based data integration, an important problem that most of the systems tend to ignore is that ontologies are living artifacts and subject to change [4]. Due to the rapid development of research, ontologies are frequently changed to depict the new knowledge that is acquired. The problem that occurs is the following: when ontologies change, the mappings may become invalid and should somehow be updated or adapted.

In this paper, we address the problem of data integration for evolving RDF/S ontologies that are used as global schemata. We address the problem for a core subset of SPARQL queries that correspond to a union of conjunctive queries. We argue that ontology change should be considered when designing ontology-based data integration systems. A typical solution would be to regenerate the mappings and then regenerate the dependent artifacts each time the ontology evolves. However, as this evolution might happen too often, the overhead of redefining the mappings each time is significant. The approach, to recreate mappings from scratch each time the ontology evolves, is widely recognized to be problematic [5], [6], [7], and instead, previously captured information should be reused. However, all current approaches that try to do that suffer from several drawbacks and are inefficient [8], [9] in handling ontology evolution in a state of the art ontology-based data integration system. The lack of an ideal approach leads us to propose a new mechanism that builds on the latest theoretical advances on the areas of ontology change [10] and query rewriting [11], [12] and incorporates and handles ontology evolution efficiently and effectively. More specifically:

  • We present the architecture of a data integration system, named Evolving Data Integration system, that allows the evolution of the ontology used as global schema. Query answering in our system proceeds in two phases: (a) query rewriting from the latest to the earlier ontology versions and (b) query rewriting from one ontology version to the local schemata. Since query rewriting to the local schemata has been extensively studied [11], [12], [13], we focus on a layer above and deal only with the query rewriting between ontology versions.

  • The query processing in the first step consists of: (i) query expansion that considers constraints coming from the ontology, and (ii) valid query rewriting that uses the changes between two ontology versions to produce rewritings among them.

  • In order to identify the changes between the ontology versions we adopt a high-level language of changes. We show that the proposed language possesses salient properties such as uniqueness, inversibility and composability. Uniqueness is a pre-requisite for the solution described in this paper, where the other two properties are nice to have, but they are not necessary for our solution. The sequence of changes between the latest and the other ontology versions is produced automatically at setup time and then those changes are translated into logical GAV mappings. This translation enables query rewriting by unfolding. Moreover, the inversibility is exploited to rewrite queries from past ontology versions to the current, and vice versa, and composability to avoid the reconstruction of all sequences of changes among the latest and all previous ontology versions.

  • Despite the fact that query rewriting always terminates, the rewritten queries issued, using past ontology versions, might fail. We show that this problem is not inhibiting in our algorithms but a consequence of information unavailability among ontology versions. To tackle this problem, we propose two solutions: (a) either to provide best “over-approximations” by means of minimally-containing and minimally-generalized queries, or (b) to provide insights for the failure by means of affecting change operations, thus driving query redefinition.

  • We show that our method is sound and complete and does not impose a significant overhead. Finally, we present our experimental analysis using two real-world ontologies. Experiments performed show the feasibility of our approach and the considerable advantages gained.

Such a mechanism, that provides rewritings among data integration systems that use different ontology versions as global schemata, is flexible, modular and scalable. It can be used on top of any data integration system—independently of the family of the mappings that each specific data integration system uses to define mappings between one ontology version and the local schemata (GAV, LAV, GLAV [13]). New mappings or ontology versions can be easily and independently introduced without affecting other mappings or other ontology versions. Our engine takes the responsibility of assembling a coherent view of the world out of each specific setting.

This paper is an extended and revised version of a previously published conference paper [14] whereas the implemented system was demonstrated in [15]. However, only the basic ideas were described in [1], without a detailed analysis of the theoretical foundation of the approach. This manuscript adds to the previously published results, the related work, the formal properties of the language of changes used to capture ontology evolution and the specific semantics of the implemented architecture. In addition, the new algorithms that were created are presented, their correctness is proved and their complexity is analyzed. Finally, an evaluation of the system is presented for the first time using real and synthetic set of queries, and a discussion is added to the conclusion of this paper.

The rest of the paper is organized as follows: Section 2 introduces the problem by an example and presents related work. Section 3 presents the architecture of our system and describes its components. Section 4 describes the semantics of such a system and Section 5 elaborates on the aforementioned query rewriting among ontology versions. Finally, Section 6 presents our experimental analysis and Section 7 provides a summary and an outlook for further research.

Section snippets

Motivating example and related work

Consider the example RDF/S ontology shown on the left of Fig. 1. This ontology is used as a point of common reference, describing persons and their contact points (“Cont.Point”). We also have two relational databases DB1 and DB2 mapped to that version of the ontology. Assume now that the ontology designer decides to move the domain of the “has_cont_point” property from the class “Actor” to the class “Person”, and to delete the property “gender”. Moreover, the “street” and the “city” properties

Evolving data integration

We conceive an Evolving Data Integration system as a collection of data integration systems, each one of them using a different ontology version as global schema. Therefore, we extend the traditional formalism from [13] and define an Evolving Data Integration system as:

Definition 3.1 Evolving Data Integration System

An Evolving Data Integration system I is a tuple of the form ((O1,S1,M1),,(Om,Sm,Mm)) where:

  • Oi is a version of the ontology used as global schema,

  • Si is a set of local sources and

  • Mi is the mapping between Si and Oi(1im).

Next

Semantics of an evolving data integration system

Now we will define semantics for an Evolving Data Integration system I. Our approach is similar to [13] and is sketched in Fig. 3. We start by considering a local database for each (Oi,Si,Mi), i.e., a database Di that conforms to the local sources of Si. For example D1 is a local database for (O1,S1,M1) that conforms to the local sources S11,S12 and S13.

Now, based on Di, we shall specify the information content of the global schema Oi. We call a database for Oi a global database. However, since

Query processing

Queries to I are posed in terms of the global schema Om. For querying, we adopt a core subset of the SPARQL language corresponding to a union of conjunctive queries [43]. We chose SPARQL since it is currently the standard query language for the semantic web and has become an official W3C recommendation. Essentially, SPARQL is a graph-matching language. Given a data source, a query consists of a pattern which is matched against, and the values obtained from this matching are processed to give

Implementation and evaluation

The approach described in this paper was implemented on our exelixis3 platform [15]. We developed the exelixis platform as a web page using PHP/JQuery/HTML for the presentation and Java/PHP for implementing the algorithms. The interface is shown on Fig. 11. Using our platform the user is able to load and visualize one version of an RDF ontology. The visualization is provided either through the jOWL4 API or the OWLSight5

Discussion and conclusion

In this paper, we argue that ontology evolution is a reality and data integration systems should be aware and ready to deal with that. To that direction, we presented a novel approach that allows query answering under evolving ontologies without mapping redefinition between each ontology version and the corresponding data sources.

Our architecture is based on a module that can be placed on top of any traditional ontology-based data integration system, enabling ontology evolution. It does so by

Acknowledgments

We would like to thank the reviewers for their valuable comments. This work has been supported by the eHealthMonitor and EURECA projects and has been partly funded by the European Commission under contracts FP7-287509 and FP7-288048.

References (50)

  • P. Bouquet et al.

    Contextualizing ontologies

    J. Web Sem.

    (2004)
  • D. Calvanese et al.

    Ontologies and databases: the DL-lite approach

  • L. Martin, A. Anguita, V. Maojo, E. Bonsma, A.I.D. Bucur, J. Vrijnsen, M. Brochhausen, C. Cocos, H. Stenzhorn, M....
  • M. Hartung et al.

    Analyzing the evolution of life science ontologies and mappings

  • G. Flouris et al.

    Ontology change: classification and survey

    Knowl. Eng. Rev.

    (2008)
  • Y. Velegrakis et al.

    Preserving mapping consistency under schema changes

    VLDB J.

    (2004)
  • C. Yu et al.

    Semantic adaptation of schema mappings when schemas evolve

  • C.A. Curino, H.J. Moon, M. Ham, C. Zaniolo, The PRISM workwench: database schema evolution without tears, in: ICDE,...
  • H. Kondylakis, G. Flouris, D. Plexousakis, Ontology & schema evolution in data integration: review and assessment, in:...
  • R. Fagin et al.

    Schema mapping evolution through composition and inversion

  • V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides, On detecting high-level changes in RDF/S...
  • A. Cali et al.

    Datalog+: a unified approach to ontologies and integrity constraints

  • A. Poggi et al.

    Linking data to ontologies

    J. Data Semantics X

    (2008)
  • M. Lenzerini

    Data integration: a theoretical perspective

  • H. Kondylakis, D. Plexousakis, Ontology evolution in data integration: query rewriting to the rescue, in: International...
  • H. Kondylakis, P. Dimitris, Exelixis: evolving ontology-based data integration system, in: SIGMOD, 2011, pp....
  • D. Barbosa et al.

    Designing information-preserving mapping schemes for XML

  • M.M. Moro et al.

    Preserving XML queries during schema evolution

  • S. Rizzi, M. Golfarelli, X-time: schema versioning and cross-version querying in data warehouses, in: ICDE, 2007, pp....
  • D.N. Xuan et al.

    A versioning management model for ontology-based data warehouses

  • H. Bounif, Schema repository for database schema evolution, in: P. Rachel (Ed.), DEXA, 2006, pp....
  • N. Edelweiss et al.

    Temporal and versioning model for schema evolution in object-oriented databases

    Data Knowl. Eng.

    (2005)
  • H.J. Moon, C.A. Curino, C. Zaniolo, Scalable architecture and query optimization for transaction-time DBs with evolving...
  • C. Gutierrez et al.

    Introducing time into RDF

    IEEE Trans. Knowl. Data Eng.

    (2007)
  • D. Ognyanov et al.

    Tracking changes in RDF(S) repositories

  • Cited by (48)

    View all citing articles on Scopus
    View full text