Combining ontological profiles with context in information retrieval

https://doi.org/10.1016/j.datak.2009.10.006Get rights and content

Abstract

An ontology is a formal conceptualization of a domain, specifying the concepts of the domain and the relations between them. It is however not a straight forward task to use this knowledge for information retrieval purposes. In this paper we describe the concept of an ontological profile, which is a semantic extension of an ontology where each ontology concept is given a description in terms of a vector of weighted keywords. An experiment has been conducted with a prototype search engine using ontological profiles for query expansion. The evaluation shows encouraging results compared to standard keyword based search. Furthermore, we describe the notion of context in an information retrieval setting and address how we can combine semantics and context in search based on query expansion.

Introduction

Ontologies are now used in a wide range of applications and have been instrumental in many interoperability projects. They help applications work together by providing common vocabularies that describe all important domain concepts without being tied to particular applications in the domains.

However, used as part of semantic search applications, ontologies have so far had only limited success. Early semantic search engines tried to use ontology concepts and structures as controlled search vocabularies, but this was unpractical both functionally and from a usability perspective. Ontologies for query disambiguation or reformulation seem more promising, though there is a fundamental problem with comparing ontology concepts with query or document terms. Concepts are abstract notions that are not necessarily linked to a particular term. A number of terms may refer to the same concept, or a specific term may be the realization of several different concepts. Using conceptual structures to index or retrieve document text requires that there is something bridging the conceptual and real world.

Another issue is tailoring ontologies to the retrieval task. Research indicates that ontologies are of little use if they are not aligned with the documents indexed by the search application. The granularity of the ontology needs to match the granularity of the document collection. While there is no need to have an elaborate ontology for a sub-domain with very few documents, it is often necessary to expand ontologies in areas that are well covered by the document collection.

This paper presents an ontology enrichment approach that both bridges the conceptual and real world and ensures that the ontology is well adapted to the documents at hand. The idea is to provide contextual concept characterizations that reveal how the concepts are referred to semantically in the document collection. The characterizations come in the form of weighted terms that are all-in some sense-related to the concept itself. The ontology together with the concept characterizations are referred to as an ontological profile of the document collection. The approach has already been used for ontology alignment (cf. [16]), and we are now experimenting with these profiles in search and ontology learning. Our initial search prototypes display a significant improvement of search relevance, provided that the quality of the characterizations are sufficient.

Further, we describe the notion of context for information retrieval purposes. We define what a search context is and how it may later be used to improve the information retrieval task for a user in a specific situation.

Section snippets

Related work

Su describes in [16] a method for ontology mapping, based on an extension of the ontology. The extension of the ontology consists of constructing a feature vector for each concept of the ontology. The feature vector is made up of weighted terms reflecting relevance between concept and terms found in an underlying document collection. Further, the vectors are used to provide a mapping between concepts of different ontologies.

In [17], Tomassen describes an ontology driven information retrieval

Ontological profiles

As stated in Section 1, matching query terms with ontology concepts is not a straightforward task. Simply relying on a syntactic match between the query term and its conceptual counterpart has its limitations. One of the most obvious problems might be that the mapping of terms to concepts and vice versa is hampered by polysemy (in both directions). As an effort towards this mapping problem we propose the ontological profile.

An ontological profile is an extension of a domain ontology. The

Construction of ontological profiles

The ontological profile is constructed based on a document collection covering the same domain as the ontology. This assures that the connection we want between the vocabulary of the domain (at least the vocabulary in the document collection used) and the concepts of the domain is found, letting the ontological profile be a semantic characterization of the domain. A detailed description of the approach is found in [14], and is based on a method described by Su [16].

Construction of the

Experiment

In this section, we report on an initial experiment carried out based on using the ontological profile for semantic query expansion in information retrieval. The process consists of two steps: (i) query interpretation and (ii) query expansion. In (i), we map from the query terms entered by the user to one or more concept(s) of the ontology. This is done by comparing the query terms with the terms and weights found in the concept vectors. In (ii), the mapped concepts are used for subsequent

Context

We are currently exploring techniques to extend the information retrieval system with contextual features that provide the user with even more relevant information depending on the user context.

A multitude of different definitions of context are to be found in the literature, and a commonality of these are that they differ slightly and are adapted to the specific application. Dey [2] came up with a definition of context that is more generic than many of the others: “Context is any information

Conclusion

We have in this paper presented the concept of ontological profiles and how they are constructed. An ontological profile is a powerful extension to an ontology, bridging the gap between the “abstract” ontology and the use of the concepts in real documents. The ontological profile describes each concept as a vector of terms with weights describing the strength of the relation between them. The evaluation of using ontological profiles for query expansion shows promising results, and a generally

Geir Solskinnsbakk received his M.Sc. degree in computer science (2007) from the Norwegian University of Science and Technology (NTNU). He is currently a Ph.D. student at the Department of Computer and Information Science at NTNU, working on semantics and context for information retrieval. His research interests include information retrieval, ontologies, semantics, context, and text mining.

References (17)

  • R. Baeza-Yates et al.

    Modern Information Retrieval

    (1999)
  • A.K. Dey

    Understanding and using context

    Personal Ubiquitous Computing

    (2001)
  • W.B. Frakes et al.

    Strength and similarity of affix removal stemming algorithms

    SIGIR Forum

    (2003)
  • J.A. Gulla, T. Brasethvik, G. Sveia Kvarv, Using association rules to learn concept relationships in ontologies, in:...
  • J.A. Gulla, S.L. Tomassen, D. Strasunskas, Semantic interoperability in the Norwegian petroleum industry, in: Fifth...
  • M. Havey

    Essential Business Process Modelling

    (2005)
  • P. Kearney, S.S. Anand, M. Shapcott, Employing a domain ontology to gain insights into user behaviour, in: Proceedings...
There are more references available in the full text version of this article.

Cited by (26)

  • Process-aware approach for managing organisational knowledge

    2016, Information Systems
    Citation Excerpt :

    These requirements are described below: Reactive knowledge supply by semantic searches: Semantic technologies facilitate the representation and search of knowledge stored in unstructured formats [29–35]. Ontology, lexical databases and artifacts for processing natural languages improve the completeness and accuracy of knowledge search results [36,37].

  • An approach for selecting seed URLs of focused crawler based on user-interest ontology

    2014, Applied Soft Computing Journal
    Citation Excerpt :

    Their purpose was to help users extract information they need without browsing all the documents, and use the internal relation and hierarchy in concept lattice. Solskinnsbakk and Gulla [17] described the user ontology profile which is a semantic extension of ontology and each ontology concept is given a description by a vector of the weighted keywords, it improved the quality of search relevance, and got the best documents as the user seed URLs. In the same year, Yang [37] proposed the ontology-supported website models to develop the focused crawler named OntoCrawler.

  • Ontology-based personalised retrieval in support of reminiscence

    2013, Knowledge-Based Systems
    Citation Excerpt :

    The implicit feedback collection is an automatic process based on analysing user behaviour (e.g. clicking sequence, browsing time, etc.) and pseudo-relevance feedback, and is not semantically related to the retrieval task. Solskinnsbakk and Gulla [55] state that the use of universal ontology may have a limited effect on improving retrieval performance. There are two main reasons: (1) the concept names in the universal ontology may be complex; thus, it is difficult to correctly recognise and utilise them; and (2) many concept names cannot accurately describe the various topics in large data collections, which could cause more ambiguity.

  • Integrating ontology technology with folksonomies for personalized social tag recommendation

    2013, Applied Soft Computing Journal
    Citation Excerpt :

    The experimental results show that the proposed hybrid approach has higher accuracy of social tag recommendation than the conventional term frequency scheme and non-hybrid approach. Since this study focuses on the semantic approach for personalized social tags recommendation, in the future, we can consider about integrating this with context-aware technologies to facilitate the implementation of context-aware personalized search applications [27]. There is a limitation in this study that it can be researched as another future work.

  • Linear combination of component results in information retrieval

    2012, Data and Knowledge Engineering
    Citation Excerpt :

    However, how to further improve retrieval performance is a demanding issue. In these days, people find that the data fusion technology is very useful for performance improvement in various kinds of applications such as image [31], multiple sensor systems [13], databases [32], information retrieval [11,27], and so on. In the following we mainly review some previous work on the linear combination method and score normalization methods for data fusion in information retrieval.

  • Scaling up top-K cosine similarity search

    2011, Data and Knowledge Engineering
    Citation Excerpt :

    However, most of these measures are used only for post-evaluation due to the lack of the computation-friendly property. In this paper, we focus on developing efficient algorithms for finding top-K strongly related item pairs as measured by the cosine similarity, which has been widely used as a popular similarity measure for high-dimensional data in text mining [31,43], information retrieval [5,29], and bioinformatics [20]. Indeed, searching for top-K strongly correlated item pairs has great use in many real-world applications.

View all citing articles on Scopus

Geir Solskinnsbakk received his M.Sc. degree in computer science (2007) from the Norwegian University of Science and Technology (NTNU). He is currently a Ph.D. student at the Department of Computer and Information Science at NTNU, working on semantics and context for information retrieval. His research interests include information retrieval, ontologies, semantics, context, and text mining.

Jon Atle Gulla is professor of Information Systems at the Norwegian University of Science and Technology since 2002. He received his M.Sc. in 1988 and his Ph.D. in 1993, both in Information Systems, at the Norwegian Institute of Technology. He also has a M.Sc. in Linguistics from the University of Trondheim and a M.Sc. of Management (Sloan fellow) from London Business School. He has previously worked as a manager in Fast Search & Transfer in Munich and as a project leader for Norsk Hydro in Brussels. His research interests include text mining, semantic search, ontologies, and enterprise modeling.

This research was carried out as part of the IS A project, project no. 176755, funded by the Norwegian Research Council under the VERDIKT program.

View full text