Keywords

1 Introduction

Ontologies can be valuable sources of domain knowledge for various applications. However, the selection of appropriate ontologies requires particular attention. The ontologies must provide a sufficient degree of entity coverage (population completeness in [1, 2]) and a sufficient level of detail (schema completeness in [1, 2]) [3]. Besides that, the faultless operation of applications also relies on the correctness (accuracy in [1, 2]) and sufficient value coverage (property completeness and interlinking completeness in [1] or column completeness in [2]) of the ontologies. To verify the correctness and completeness of a candidate ontology, modeled facts must be compared to actual facts. These actual facts would be contained in a gold standard data source. Classical ontology engineering methodologies use competency questions to specify requirements and to verify requirement compliance. Expected answers to competency questions for the verification of correctness or completeness of facts in ontologies would implicitly also represent a gold standard data source. However, the existence of a gold standard data source for real world data is almost impossible. Due to this general lack of gold standard data sources, we proposed the direct comparison of multiple independent candidate ontologies to approximate their correctness and completeness [4]. Measures of the correctness and completeness could then support the selection of appropriate ontologies that fulfill the requirements of a certain project. As an example, consider two candidate ontologies containing 100 and 150 relevant entities respectively for a text annotation service. The first contains 100 English and 100 Spanish labels, the second 140 English and 130 Spanish labels. A comparison reveals that 90 of the entities are contained in both ontologies and detects deviations between the ontologies in 40 Spanish labels, caused by errors in the second ontology. Depending on the focused languages, this allows a more profound choice of the ontology.

The term ontology comparison is used with different meanings in the literature: (a) The comparison of entire ontologies regarding certain aspects to evaluate or select ontologies, (b) the comparison of different versions of one ontology to highlight changes, (c) the comparison of single entities or sets of entities to calculate recommendations of entities, or (d) the calculation of the similarity of single or a few entities from different ontologies to match or merge these ontologies [4]. In this paper, we focus on Variant (a), only.

We introduce ABECTO, an ABox evaluation and comparison tool for ontologies. ABECTO implements a framework for comparing multiple ontologies in the same domain. To the best of our knowledge, this is the first software tool for the comparison of ontologies on ABox level to approximate their correctness and completeness. In the remainder of this article, we will introduce the functionality of ABECTO in Sect. 2, explain our strategy to handle different modeling approaches in Sect. 3, describe the implementation of ABECTO in Sect. 4, and describe the demonstration in Sect. 5.

2 System Overview

ABECTO implements our framework for ontology ABox comparison described in [4]. The framework consists of five components, as shown in Fig. 1: (a) A source component to load ontologies, (b) a transformation component to add deduced axioms to the ontologies in preparation of further processing, (c) a mapping component to map the resources of the ontologies, (d) a comparison component to provide measurements of the ontologies, and (e) an evaluation component to identify potential mistakes in the ontologies.

Fig. 1.
figure 1

Schematic of the comparison framework implemented in ABECTO. The order of the transformation and mapping processes is up to the user.

For each component, ABECTO provides a couple of processors, which provide a specific functionality. These processors can be arranged by the users into a processing pipeline to define the comparison process.

3 Handling of Different Modeling Approaches

The comparison of the ABoxes requires identifying corresponding facts of the ontologies. However, different ontologies of the same domain might use different approaches to model certain aspects of this domain. For example, there might be (a) properties corresponding to a chain of properties, (b) anonymous individuals corresponding to named individuals, (c) data properties corresponding to annotation properties, or (d) classes corresponding to individuals [4].

To meet this challenge, the sets of resources and their comparable properties are described with so-called categories. A category is defined by a SPARQL GroupGraphPattern [5] (the WHERE clause) for each ontology. The variable with the same name as the category represents the resource to compare. The bindings of all other equally named variables will be compared. This enables the definition of the facts to compare in a way that meets all mentioned cases: (a) Resource and variables can be linked by properties as well as complex property paths, (b) unambiguous IRIs can be created using key properties values, (c) resource and variables can be linked by data properties as well as annotation properties, and (d) the resource might represent a class as well as a individual. In the further processing, these patterns will be used to obtain the facts for ontology comparison.

4 Implementation

ABECTO is implemented as a Java HTTP REST service based on Apache JenaFootnote 1 and SpringFootnote 2 to provide a convenient interface for user interfaces or other applications. The size of compared ontologies is mainly limited by the memory required to represent the ontologies. Therefore, we expect ABECTO to be able to process large ontologies on appropriate hardware. A Python module provides handy functions to use ABECTO inside a Jupyter notebookFootnote 3, hiding the raw HTTP requests. This allows an easy setup of reproducible ontology comparison projects. However, the result presentation in the Jupyter Notebook interface for ABECTO is only suitable for smaller ontologies. To support large ontologies, an independent interface like a stand-alone web application would be needed. The sources of ABECTO are publicly available under the Apache 2.0 license [6].

In ABECTO, the ontologies will be compared inside of a project. A project consists of several ontologies and a processing pipeline. Each node of the pipeline represents a processor with a particular configuration. A processor is a Java class with specified methods to generate an output RDF model. The start nodes of the pipeline are the nodes representing a source processor, which loads an RDF model from an external source. To support modularized ontologies, multiple source nodes might belong to one ontology. Nodes of other processors require at least one input node. These processors can be divided into (a) transformation processors, which extend the input RDF model, (b) mapping processors, which provide resource mappings of the input RDF models of different ontologies, and (c) meta processors, which calculate comparative meta data from the input RDF models. The comparative meta data include measurements, like resource counts, identified deviations of mapped resources, issues, like an encountered literal when a resource was expected, and categories, which define the sets of resources and their properties to compare. The output RDF models of source and transformation processors belong to a certain ontology, whereas the output RDF models mapping and meta processors do not belong to a certain ontology. Therefore, they will be treated differently by the subsequent processors. The following processors are already available in ABECTO:

  • RdfFileSourceProcessor: Loads an RDF document from the local file system.

  • JaroWinklerMappingProcessor: Provides mappings based on Jaro-Winkler Similarity [7] of string property values using our implementation from [8].

  • ManualMappingProcessor: Enables users to manually adjust the mappings by providing or suppressing mappings.

  • RelationalMappingProcessor: Provides mappings based on the mappings of referenced resources.

  • OpenlletReasoningProcessor: Infers the logical consequences of the input RDF models utilizing the Openllet ReasonerFootnote 4 to generate additional triples.

  • SparqlConstructProcessor: Applies a given SPARQL Construct Query to the input RDF models to generate additional triples.

  • CategoryCountProcessor: Measures the number of resources and property values per category.

  • LiteralDeviationProcessor: Detects deviations between the property values of mapped resources as defined in the categories.

  • ManualCategoryProcessor: Enables users to manually define resource categories and their properties.

  • ResourceDeviationProcessor: Detects deviations between the resource references of mapped resources as defined in the categories.

We plan to add further processors in the near future, including:

  • A mapping processor that employs the well known matching libraries using the Alignment API [9].

  • A mapping processor that reuses mappings contained in the ontologies.

  • A mapping processor that provides transitive mappings based on results of other mappings.

  • A meta processor that utilizes mark and recapture techniques [10] to measure the completeness of ontologies.

  • A source processor that loads an RDF document from a URL.

  • A source processor that imports triples of a specified scope from a SPARQL endpoint.

  • A source processor that utilizes SPARQL Generate [11] to load comparison data from non-RDF documents.

The meta data models generated by the nodes can be used to generate reports. These reports might contain the calculated measurements, deviations and issues. Figure 2 shows an example report generated in a Jupyter Notebook.

Fig. 2.
figure 2

Screenshot of an example report generated in a Jupyter Notebook. The report shows one type of measurement (number of resources and property values per category), encountered deviations, and encountered issue of a comparison of three ontologies.

5 Demonstration

We will demonstrate how users can utilize ABECTO to compare and evaluate ontologies. We will provide sets of real world RDF documents with prepared project definitions and category descriptions. The projects are managed inside of Jupyter notebooks. A tutorial notebook is available and can be executed onlineFootnote 5 using Binder [12]. Users will be able to manipulate and execute the project pipelines and examine the resulting comparison and evaluation reports.