Class expression learning for ontology engineering

doi:10.1016/j.websem.2011.01.001

Journal of Web Semantics

Volume 9, Issue 1, March 2011, Pages 71-81

https://doi.org/10.1016/j.websem.2011.01.001 Get rights and content

Abstract

While the number of knowledge bases in the Semantic Web increases, the maintenance and creation of ontology schemata still remain a challenge. In particular creating class expressions constitutes one of the more demanding aspects of ontology engineering. In this article we describe how to adapt a semi-automatic method for learning OWL class expressions to the ontology engineering use case. Specifically, we describe how to extend an existing learning algorithm for the class learning problem. We perform rigorous performance optimization of the underlying algorithms for providing instant suggestions to the user. We also present two plugins, which use the algorithm, for the popular Protégé and OntoWiki ontology editors and provide a preliminary evaluation on real ontologies.

Section snippets

Introduction and motivation

The Semantic Web has recently seen a rise in the availability and usage of knowledge bases, as can be observed within the Linking Open Data Initiative, the TONES and Protégé ontology repositories, or the Watson search engine. Despite this growth, there is still a lack of knowledge bases that consist of sophisticated schema information and instance data adhering to this schema. Several knowledge bases, e.g. in the life sciences, only consist of schema information, while others are, to a large

Preliminaries

For an introduction to OWL and description logics, we refer to [4] and [17].

Finding a suitable heuristic

A heuristic measures how well a given class expression fits a learning problem and is used to guide the search in a learning process. To define a suitable heuristic, we first need to address the question of how to measure the accuracy of a class expression. We introduce several heuristics, which can be used for CELOE and later evaluate them.

We cannot simply use supervised learning from examples right-away, since we do not have positive and negative examples available. We can try to tackle this

Efficient heuristic computation

Most of the runtime of a learning algorithm is spent for computing heuristic values, since they require expensive reasoner requests. In particular, retrieval operations are often required. Performing an instance retrieval can be very expensive for large knowledge bases. Depending on the ontology schema, this may require instance checks for many or even all objects in the knowledge base. Furthermore, a machine learning algorithm easily needs to compute the score for thousands of expressions due

Adaptation of the learning algorithm

While the major change compared to other supervised learning algorithms for OWL is the previously described heuristic, we also made further modifications. The goal of those changes is to adapt the learning algorithm to the ontology engineering scenario: for example, the algorithm was modified in order to introduce a strong bias towards short class expressions. This means that the algorithm is less likely to produce long class expressions, but is almost guaranteed to find any suitable short

The Protégé plugin

After implementing and testing the described learning algorithm, we integrated it into Protégé and OntoWiki. Together with the Protégé developers, we extended the Protégé 4 plugin mechanism to be able to seamlessly integrate the DL-Learner plugin as an additional method to create class expressions. This means that the knowledge engineer can use the algorithm exactly where it is needed without any additional configuration steps. The plugin has also become part of the official Protégé 4

The OntoWiki plugin

Analogous to Protégé, we created a similar plugin for OntoWiki [3]. OntoWiki is a lightweight ontology editor, which allows distributed and collaborative editing of knowledge bases. It focuses on wiki-like, simple and intuitive authoring of semantic content, e.g. through inline editing of RDF content, and provides different views on instance data.

Recently, a fine-grained plugin mechanism and extensions architecture was added to OntoWiki. The DL-Learner plugin is technically realised by

Evaluation

To evaluate the suggestions made by our learning algorithm, we tested it on a variety of real world ontologies of different sizes and domains. Please note that we intentionally do not perform an evaluation of the machine learning technique as such on existing benchmarks, since we build on the base algorithm already evaluated in detail in [26]. It was shown that this algorithm is superior to other supervised learning algorithms for OWL and at least competitive with the state of the art in ILP.

Related work

Related work can be categorised into two areas: The first one being supervised machine learning in OWL/DLs and the second being work on (semi-)automatic ontology engineering methods.

Early work on supervised learning in description logics goes back to e.g. [9], [10], which used so-called least common subsumers to solve the learning problem (a modified variant of the problem defined in this article). Later, [7] invented a refinement operator for $A L E R$ and proposed to solve the problem by using a

Conclusions and future work

We presented the CELOE learning method specifically designed for extending OWL ontologies. Five heuristics were implemented and analysed in conjunction with CELOE along with several performance improvements. A method for approximating heuristic values has been introduced, which is useful beyond the ontology engineering scenario to solve the challenge of dealing with a large number of examples in ILP [35]. Furthermore, we biased the algorithm towards short solutions and implemented optimisations

References (35)

F. Baader et al.
Computing the least common subsumer w.r.t. a background terminology
Journal of Applied Logic
(2007)
A. Agresti
An Introduction to Categorical Data Analysis
(1997)
A. Agresti et al.
Approximate is better than “exact” for interval estimation of binomial proportions
The American Statistician
(1998)
S. Auer, S. Dietzold, T. Riechert, Ontowiki – A Tool for Social, Semantic Collaboration, in: ISWC 2006, vol. 4273 of...
F. Baader et al.
Completing description logic knowledge bases using formal concept analysis
L. Badea et al.
A refinement operator for description logics
A. Blumer et al.
Occam’s razor
W.W. Cohen et al.
Computing least common subsumers in description logics
W.W. Cohen et al.
Learning the CLASSIC description logic: theoretical and experimental results

C. d’Amato et al.

A semantic similarity measure for expressive description logics

C. d’Amato et al.

A note on the evaluation of inductive concept classification procedures

C. d’Amato et al.

Query answering and ontology population: An inductive approach

F. Esposito et al.

Knowledge-intensive induction of terminologies from metadata

N. Fanizzi et al.

DL-FOIL concept learning in description logics

S. Hellmann et al.

Learning of OWL class descriptions on very large knowledge bases

International Journal on Semantic Web and Information Systems

(2009)

P. Hitzler et al.

Foundations of Semantic Web Technologies

(2009)

Cited by (0)

View full text

Class expression learning for ontology engineering

Abstract

Section snippets

Introduction and motivation

Preliminaries

Finding a suitable heuristic

Efficient heuristic computation

Adaptation of the learning algorithm

The Protégé plugin

The OntoWiki plugin

Evaluation

Related work

Conclusions and future work

Journal of Applied Logic

An Introduction to Categorical Data Analysis

Approximate is better than “exact” for interval estimation of binomial proportions

The American Statistician

Completing description logic knowledge bases using formal concept analysis

A refinement operator for description logics

Occam’s razor

Computing least common subsumers in description logics

Learning the CLASSIC description logic: theoretical and experimental results

A semantic similarity measure for expressive description logics

A note on the evaluation of inductive concept classification procedures

Query answering and ontology population: An inductive approach

Knowledge-intensive induction of terminologies from metadata

DL-FOIL concept learning in description logics

Learning of OWL class descriptions on very large knowledge bases

International Journal on Semantic Web and Information Systems

Foundations of Semantic Web Technologies