Automatic construction of a large-scale situation ontology by mining how-to instructions from the web

https://doi.org/10.1016/j.websem.2010.04.006Get rights and content

Abstract

With the growing interests in semantic web services and context-aware computing, the importance of ontologies, which enable us to perform context-aware reasoning, has been accepted widely. While domain-specific and general-purpose ontologies have been developed, few attempts have been made for a situation ontology that can be employed directly to support activity-oriented context-aware services. In this paper, we propose an approach to automatically constructing a large-scale situation ontology by mining large-scale web resources, eHow and wikiHow, which contain an enormous amount of how-to instructions (e.g., “How to install a car amplifier”). The construction process is guided by a situation model derived from the procedural knowledge available in the web resources. Two major steps involved are: (1) action mining that extracts pairs of a verb and its ingredient (i.e., objects, location, and time) from individual instructional steps (e.g., <disconnect, ground cable>) and forms goal-oriented situation cases using the results and (2) normalization and integration of situation cases to form the situation ontology. For validation, we measure accuracy of the action mining method and show how our situation ontology compares in terms of coverage with existing large-scale ontology-like resources constructed manually. Furthermore, we show how it can be utilized for two applications: service recommendation and service composition.

Introduction

Ontological knowledge has become a main vehicle for semantically and conceptually oriented techniques and applications such as word sense disambiguation, searching, classification, question answering, entity resolution, and context/situation-aware reasoning for personalized services. However, currently available large-scale ontologies often fail to deal with diverse task situations that may arise in the real world because they lack in understanding the dynamic nature of daily lives of people and the associated activities. For example, automatically built ontologies like YAGO [12] driven by WordNet [40] and Wikipedia [39] do not have a sufficient coverage of contextual instances to reason about situations and activities arising from different domains. There is no consideration about such activities of daily living as shopping, driving, wedding, etc., for which the context variables like actions, location, and time should be made available. Without a situation ontology of this kind, it would not be possible to infer what activity the user is engaged in and what actions are likely to be taken from the current situation, which can be characterized with context variables like the current location, objects used, and time.

As a novel solution to the problem, we attempt to build a huge situation knowledge base of human activities by means of text mining techniques that exploit the structure of the how-to descriptions, which is essential for context/situation-aware services. Action level knowledge is extracted from eHow1 and wikiHow2, freely accessible websites currently storing more than one million articles on how to do things step by step, which collectively cover almost every domain of daily lives including business, cars, computers, education, health, travel, weddings, etc. An article can be converted into an instance of a situation ontology model that consists of a goal, action sequence, and contextual ingredient that includes location, time, and objects. To organize such knowledge, we have defined a situation ontology specification that includes six ontology classes, topic, goal, action, object, time, and location, and six types of semantic relations, hasTopic, hasAction, hasNextAction, hasObject, hasTime, and hasLocation, all of which are derived from the eHow articles, as in Fig. 2.

We crawled the entire set of articles from the eHow and the wikiHow websites and applied natural language processing (NLP) techniques to obtain a highly refined situation ontology, which can help detecting the current situation of a user in a daily life and suggesting a solution suitable for the problem at hand if any. The task of the employed NLP techniques is to extract actions expressed in a verb form and associated contextual ingredient items from the goal and subsequent action sequences expressed in natural language in an article. In order to put the linguistic constituents in an ontological form,3 we designed four additional steps: goal normalization, action normalization, action transition probability calculation, and ingredient resolution.

To assess the utility of the proposed method and its outcome, we measured accuracy and coverage of the automatically constructed ontology. Accuracy was measured by taking a random sample of the situation instances converted from the corresponding articles. We checked whether or not those instances were clear without ambiguity and well-formed. For coverage of the resulting ontology, it was compared for verbs against existing large-scale ontology-like resources: WordNet and OMICS [27].

In this paper, an automatic situation ontology construction based on action mining from the Web is presented to build a large-scale situation ontology that is required to reason about user intentions (or situations) and provide relevant recommendations in a given context. Its main contribution is to show that an automatic methodology can be employed to construct a large-scale situation ontology for the situation model with high precision. Given the dynamic nature of knowledge in people's daily activities, it is critical to devise an automatic method for constructing situation ontologies. Through the application scenarios, we also show that the ontology constructed as such can be of practical value for context-aware applications. We advocate that the high accuracy of the method and the sheer size and utility of the situation ontology lend themselves to further research and development in context-aware applications involving unconstrained daily lives.

Section 2 describes the main features and drawbacks of previous work concerning situation-awareness, situation ontology, and automatic ontology construction to set the stage for our work. In Section 3, we introduce our situation ontology model and the resources from which the current situation ontology is constructed. Section 4 explains the details of our situation ontology construction process focusing action mining and normalization. In Section 5, we present an evaluation of the constructed ontology for its accuracy and comparison to other ontology-like resources. Section 6 shows how the newly constructed situation ontology can be utilized in situation-aware recommendation and semantic web service composition. In Section 7, we give our conclusion and discuss future directions.

Section snippets

Related work

The notion of context-awareness in ubiquitous computing was proposed in 1990s to address the interaction between computer systems and environments [5]. Situation-awareness has also been used to refer to the same meaning [13]. The notion has received a great deal of attention because it is a basis for improving the quality of decisions in a heterogeneous, highly dynamic environment [26]. The meaning of information about the perceived objects can be correctly determined when the situation or

Situation ontology

In this section, we introduce our situation model and situation ontology specification that are driven by the content how-to knowledge, eHow and wikiHow. The model is intended to hold the action knowledge available in the resources, instead of taking a prescriptive approach for general purposes. In addition, the details of the knowledge sources are presented.

Situation ontology construction: goal-action mining

The goal of our ontology construction process is to derive an explicit specification of goals and associated actions from how-to instructions people created so that they can serve as conceptualization of situations that arise in daily lives. As depicted in Fig. 4, there are two main sub-processes. The how-to articles from the eHow and wikiHow sites are first processed with both a syntactic pattern based method and a probabilistic method so that actions (in the form of verbs) and associated

Evaluation

In order to validate our effort for automatically constructing a situation ontology, we first show the statistics of the result and discuss about the experiment we ran for extraction accuracy and its result for both the syntactic pattern-based and the CRF-based methods. To put the result in perspective, we compare the coverage of the resulting situation ontology with other ontology-like resources, WordNet and OMICS, in terms of actions covered.

Applications

To demonstrate the applicability of the situation ontology, we introduce two application scenarios where it can play a key role: situation-aware service recommendation and semantic web service composition. In the first application, the system attempts to infer user's current situation through identification of the goal that can be revealed by contextual information including user's current location, actions taken, and objects used for the actions. Since the ontology currently contains about

Conclusion and future work

We presented an automatic approach to constructing a large-scale situation ontology by means of action mining from the web resources. Especially, in order to aggregate situation knowledge from evolving web resources, such as eHow.com and wikiHow.com, we have defined a situation ontology model consisting of user goals, action sequences, and their context information such as objects, locations, and times, all of which are derived from the how-to instructions in natural language.

The ontology

Acknowledgements

Support for this research came from the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center support program supervised by the National IT Industry Promotion Agency; NIPA-2009-(C1090-0903-0008). Financial support for this work also came from a grant from the strategic technology development program 2008-F-047-02 of the Ministry of Knowledge Economy.

References (40)

  • DBpedia, A community effort to extract structured information from Wikipedia. Available at <http://dbpedia.org/>...
  • E. Agichtein et al.

    Snowball: extracting relations from large plain-text collections

  • E. Agirre et al.

    Enriching very large ontologies using the WWW

  • H. Jaygarl et al.

    HESA: a human-centric evolvable situation-awareness model in smart homes

  • H.M. Wallach, Conditional random fields: an introduction. Technical Report MS-CIS-04-21, University of Pennsylvania,...
  • Howto, An explanation about howto from Wikipedia. Available at <http://en.wikipedia.org/wiki/Howto> (accessed...
  • Implementing Semantic Web Services, The SESA Framework, D. Fensel, M. Kerrigan, M. Zaremba (Eds.),...
  • J. Reisinger et al.

    Low-cost supervision for multiple-source attribute extraction

  • K. Bellare et al.

    Lightly-supervised attribute extraction

  • K. Cios et al.

    Data Mining: A Knowledge Discovery Approach

    (2007)
  • Cited by (53)

    • A comprehensive survey of procedural video datasets

      2021, Computer Vision and Image Understanding
      Citation Excerpt :

      Examples include videos on cooking, assembly, repair, craft, beauty care, etc. While various works have investigated mining procedural knowledge from natural language sources (Perkowitz et al., 2004; Jung et al., 2010; Addis et al., 2011; Yang and Nyberg, 2015), many events are implicit and are not described explicitly in natural language. A picture is worth a thousand words, and actions speak louder than words.

    • Non-Sequential Graph Script Induction via Multimedia Grounding

      2023, Proceedings of the Annual Meeting of the Association for Computational Linguistics
    • Causal Reasoning About Entities and Events in Procedural Texts

      2023, EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023
    View all citing articles on Scopus
    View full text