A flexible framework to experiment with ontology learning techniques
Introduction
The Semantic Web is an evolving extension of the World-Wide Web, in which content is encoded in a formal and explicit way, and can be read and used by software agents [2]. It depends heavily on the proliferation of ontologies. An ontology constitutes a formal conceptualization of a particular domain shared by a group of people. In complex domains to identify, define, and conceptualize a domain manually, can be a costly and error-prone task. This problem can be eased by semi-automatically generating an ontology.
Most domain knowledge about domain entities and their properties and relationships is embodied in text collections – with varying degrees of explicitness and precision. Ontology learning from text has therefore been among the most important strategies for building an ontology. Machine learning and automated language-processing techniques have been used to extract concepts and relationships from structured and unstructured data, such as text and databases. For instance, Cimiano et al. [7] use statistical analysis to extract terms and produce a taxonomy. Similarly, Reinberger and Spyns [21] use shallow linguistic parsing for concept formation and identify some types of relationships by using prepositions.
Researchers have realized that the output for the ontology learning process is far from being perfect [14]. One problem is that in most cases it is not obvious to how to use, configure and combine techniques from different fields for a specific domain. Although there are a few published results about combinations of techniques, for instance [23], the problem is far from being solved. For example, some researchers use different text processing techniques such as stopwords filtering [5], lemmatization [4] or stemming [13] to generate a set of pre-processed data as input for the concept identification. However, there are no comparative studies that show the effectiveness of these linguistics pre-processing techniques. An additional problem for ontology learning is that most frameworks use a pre-defined combination of techniques. Thus, they do not include any mechanism for carrying out experiments with combinations or the ability to include new ones. Reinberger et al. [22] point out that: “To our knowledge no comparative study has been published yet on the efficiency and effectiveness of the various techniques applied to ontology learning”.
Our motivation is to help to make the ontology learning process controllable. Because of this, it is important to know the contribution of the available techniques and the efficiency of a technique combination. We think that the failure to evaluate the relative efficacy of different NLP techniques is likely to hinder the development of effective learning and knowledge acquisition support for ontology engineering. Due to the above problem, both a flexible framework and an integrated tool-suite to configure and combine techniques applied to ontology learning are proposed. The general architecture of our solution integrates an existing linguistic tool (WMatrix [20]), which provides part-of-speech (POS) and semantic tagging, an ontology workbench for information extraction, and an existing open source ontology editor called Protégé [16].1 This work is part of a larger project to build ontologies semi-automatically by processing a collection of domain texts. It involves dealing with four fundamental issues: extracting the relevant domain terminology, discovering concepts, deriving a concept hierarchy, and identifying and labeling ontological relations. Our work involves the innovative adaptation, integration and application of existing NLP and machine learning techniques in order to answer the following research question:
Can shallow analysis of the kind enabled by a range of linguistic and statistical NLP and corpus linguistic techniques identify key domain concepts? Can it do it with sufficient confidence in the correctness and completeness of the result?
The main contributions of our project are:
- •
Providing ontology engineers with a coordinated and integrated tool for knowledge objects extraction and ontology modelling.
- •
Evaluating the contribution of different NLP and machine learning techniques and their combinations for ontology learning.
- •
Proposing a guideline to configure and combine techniques applied to ontology learning.
In this paper we present the results achieved so far:
- •
The definition of a framework which provides support for testing different NLP and machine learning techniques to support the semi-automatic ontology learning process.
- •
A prototype workbench for knowledge object extraction which provides support for the framework. This workbench integrates a set of NLP and corpus linguistics techniques for experimenting with them.
- •
Comparative analysis using a set of linguistic and statistical techniques.
The remainder of our paper is organized as follows. We begin by introducing related work. Then, we present the main parts of the framework by describing and characterizing each of the activities that form the process. Next, we present experiments using a set of linguistic and statistical techniques. Finally, we discuss the results of the experiments and present the conclusions.
Section snippets
Background
In recent years, a number of frameworks that support ontology learning processes have been reported. They implement several techniques from different fields such as knowledge acquisition, machine learning, information retrieval, natural language processing, artificial intelligence reasoning and database management, as shown by the following work:
- •
ASIUM [11] learns verb frames and taxonomic knowledge, based on statistical analysis of syntactic parsing of French texts.
- •
Text2Onto [6] is a complete
The ontology framework: OntoLancs
Our research project principally addresses the issue of quantitatively evaluating the usefulness or accuracy of techniques and combinations of techniques applied to ontology learning. We have integrated a first set of natural language processing, corpus linguistics and machine learning techniques for experimentation. They are: (a) POS grouping, (b) stopwords filtering, (c) frequency filtering, (d) POS filtering, (e) lemmatization, (f) stemming, (g) frequency profiling, (h) concordance, (i)
Experiments
In this section we describe the mechanism our framework provides for evaluating the efficacy of different NLP techniques for the crucial second phase of the ontology learning process described in Section 3.1.
The experiments were designed to extract a set of candidate concepts from a domain corpus using a combination of NLP and machine learning techniques and to check the correspondence between the candidate concepts and the classes of a DAML reference ontology. In order to assess the efficiency
Conclusions and further work
In this paper, we have described an ongoing project which proposes a flexible framework for the ontology learning process. This framework is designed as a cyclical process to experiment with different techniques and combinations of techniques. It provides support to determine what techniques or their combinations provide optimal performances for the ontology learning process. An ontology engineer can decide techniques or combinations which will be used to extract concepts and turn them into an
References (25)
- et al.
Learning to construct knowledge bases from the World Wide Web
Artif. Intell.
(2000) - et al.
Learning domain ontologies for semantic web service descriptions
J. Web Sem.
(2005) From plain character strings to meaningful words: producing better full text databases for inflectional and compounding languages with morphological analysis software
Inf. Retr.
(2001)- et al.
The Semantic Web – a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
Sci. Am.
(2001) - P. Buitelaar, M. Sintek, OntoLT version 1.0: middleware for ontology extraction from text, in: Proc. Demo Session at...
- P. Buitelaar, S. Ramaka, Unsupervised ontology-based semantic tagging for knowledge markup, in: S.B. Wray Buntine, A....
- et al.
Learning ontologies to improve text clustering and classification
- P. Cimiano, J. Volker, Text2onto – a framework for ontology learning and data-driven change discovery, in: Proc. NLDB...
- P. Cimiano, L. Schmidt-Thieme, A. Pivk, S. Staab, Learning taxonomic relations from heterogeneous evidence, in: P....
- H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, GATE: a framework and graphical development environment for robust...
OWL Web Ontology Language Reference
W3C
Cited by (52)
A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation
2017, Computer Speech and LanguageCitation Excerpt :Over recent years, various semantic lexical resources and semantic annotation tools have been developed, such as EuroWordNet (Vossen, 1998) and the UCREL (University Centre for Computer Corpus Research on Language) Semantic Analysis System (USAS) (Rayson et al., 2004), and they have played an important role in developing intelligent natural language processing (NLP) and Human language technology (HLT) systems. For example, the USAS semantic tagger has been applied in a variety of studies, including empirical language studies at the semantic level (Klebanov et al., 2008; Ooi et al., 2007; Potts and Baker, 2013; Rayson et al., 2004), studies in information technology (Doherty et al., 2006; Nakano et al., 2005; Volk et al., 2002), software engineering (Chitchyan et al., 2006; Taiani et al., 2008) and others (Balossi, 2014; Gacitua et al., 2008; Hancock et al., 2013; Markowitz and Hancock, 2014; Semino et al., 2015). In this paper, we present our work in designing, developing and evaluating the accuracy of a new semantic tagger: the “Historical-Thesaurus-based Semantic Tagger” (henceforth HTST).
Concept relation extraction using Naïve Bayes classifier for ontology-based question answering systems
2015, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :Ontologies have had a great impact on several fields, e.g., biology and medicine. Most domain ontology constructions are not performed automatically (Gacitua et al., 2008). Most of the work on ontology-driven QAs tend to focus on the use of ontology for query expansion (Mc Guinness, 2004).
From semantics to pragmatics: where IS can lead in Natural Language Processing (NLP) research
2021, European Journal of Information SystemsDomain Knowledge Discovery Guided by Software Trace Links
2018, Proceedings - 2018 5th International Workshop on Artificial Intelligence for Requirements Engineering, AIRE 2018Natural language semantic construction based on cloud database
2018, Computers, Materials and Continua