Spoken language understanding using weakly supervised learning

https://doi.org/10.1016/j.csl.2009.05.002Get rights and content

Abstract

In this paper, we present a weakly supervised learning approach for spoken language understanding in domain-specific dialogue systems. We model the task of spoken language understanding as a two-stage classification problem. Firstly, the topic classifier is used to identify the topic of an input utterance. Secondly, with the restriction of the recognized target topic, the slot classifiers are trained to extract the corresponding slot-value pairs. It is mainly data-driven and requires only minimally annotated corpus for training whilst retaining the understanding robustness and deepness for spoken language. More importantly, it allows that weakly supervised strategies are employed for training the two kinds of classifiers, which could significantly reduce the number of labeled sentences. We investigated active learning and naive self-training for the two kinds of classifiers. Also, we propose a practical method for bootstrapping topic-dependent slot classifiers from a small amount of labeled sentences. Experiments have been conducted in the context of the Chinese public transportation information inquiry domain and the English DARPA Communicator domain. The experimental results show the effectiveness of our proposed SLU framework and demonstrate the possibility to reduce human labeling efforts significantly.

Introduction

Spoken dialogue systems have been attracting extensive interests from the research and industrial communities since they provide a natural interface between human and computer, which has such potential benefits as remote or hands-free access, ease of use, naturalness, and greater efficiency of interaction (Walker et al., 1997). In recent years, many spoken dialogue systems have appeared in a variety of application domains, including customer service (Price, 1990), information inquiring (Blomberg et al., 1993, Lamel et al., 1997, Zue et al., 2000), call routing (Gorin et al., 1997), planning (Allen et al., 1995), etc. The success of spoken dialogue systems relies on the correct recognition of not only what is said, which is achieved by Automatic Speech Recognition (ASR), but also what is meant, which is accomplished by Spoken Language Understanding (SLU) (Wang et al., 2005). SLU is one of the key components in a spoken dialogue system. Its task is to identify the user’s goal and extract from the input utterance the information needed to complete the query.

Although it focus only on limited domains, SLU still faces great challenges. One challenge is the robustness problem. In addition to the difficulties intrinsic to natural language processing, the speech recognizer inevitably makes errors. Also, spoken language is plagued with a large set of spontaneous speech phenomena such as false start, self-correction, repetitions and hesitations, ellipsis, out-of-order structures and so on. Thus, the performance of the SLU component should deteriorate gracefully when the input utterances are ill-formed. Another challenge is the portability problem, which relates to how flexible new SLU components for new applications or new languages can be built quickly at a reasonable cost (Gao et al., 2005). Currently, the development of spoken language systems relies often heavily on human works, which has been one of the main bottlenecks for rapid development of spoken dialogue systems. The rule-based SLU approaches require the linguistic experts to handcraft the domain-specific grammar for parsing, which is a time-consuming, laboursome and error-prone task. On the other hand, although requiring very little a-priori knowledge handcrafted by the linguistic experts, the general statistical SLU approaches need a large amount of labeled data to achieve reasonable performance. These drawbacks prevent the efficient portability of the SLU component to new domains and languages.

In this paper, we try to propose a robust and portable approach for spoken language understanding in hope that this approach has such desirable properties as follows:

  • It should have good robustness for ill-formed spoken utterances while keeping the understanding deepness.

  • It may be basically data-driven and requires only minimally annotated data for training. Therefore, it can be easily portable across different domains and languages.

  • It can be trained using the weakly supervised learning approaches and hence further reduce the cost of labeling training utterances.

The remainder of this paper is organized as follows. The next section introduces the related works about SLU. Section 3 presents our SLU framework and describes its components in depth. Section 4 focuses on the weakly supervised training approaches for our SLU framework. Section 5 gives the experimental setup and results. Finally, Section 6 concludes the paper and gives the future works.

Section snippets

Related works

Generally, there are two mainstreams in the SLU research: rule-based approaches and data-driven approaches. These two kinds of approaches also can be combined.

General knowledge source for SLU

In order to enable a spoken dialogue system to support a conversation between a human and an information back-end, it is important to model the semantic structure of the corresponding application domain. Usually, The semantic structure of an application domain is defined in terms of a set of semantic frames, which is often called domain model. A semantic frame contains a frame type representing the topic of the input sentence, and some slots representing the constraints the query goal has to

Weakly supervised training for two-stage classification based SLU

As stated before, to train the classifiers for topic identification and slot-filling, we need to label each sentence in the training corpus against the semantic frame. Although this annotating scenario is relatively minimal, the labeling work is still time-consuming and costly. Meanwhile unlabeled sentences are relatively easy to collect. Therefore, to reduce the cost of labeling training utterances, we investigate weakly supervised techniques for training the topic and slot classifiers.

Data collection and experimental setting

Our experiments were carried out in two corpora. One is a Chinese corpus in the context of public transportation information inquiry domain. The other is the English DARPA Communicator Travel Data (Communicator Travel Data, 2004), which is related to air travel, hotel reservation, car rental, etc.

We collected two kinds of corpora for the Chinese transportation information inquiry domain in different ways. Firstly, a natural language corpus was collected through a specific website which

Conclusion and future work

We have presented a new SLU framework based on two-staged classification. The proposed framework exhibits the advantages as follows.

  • It has good robustness on processing spoken language: (1) the preprocessor provides low level robustness. (2) It inherits the robustness of topic classification using statistical pattern recognition techniques. It can also make use of topic classification to guide slot filling. (3) The strategy of first finding the concept or slot islands and then linking them is

References (58)

  • Carpenter, B., Chu-Carroll, J., 1998. Natural language call routing: a robust, self-organizing approach. In:...
  • Chang, C., Lin, C., 2001. LIBSVM: a library for support vector machines. Software available at...
  • Clark, S., Curran, J., Osborne, M., 2003. Bootstrapping POS-taggers using unlabeled data. In: Proceedings of...
  • D. Cohn et al.

    Improving generalization with active learning

    Machine Learning

    (1994)
  • Collins, M., Singer, Y., 1999. Unsupervised models for named entity classification. In: Proceedings of...
  • Dowding, J., Gawron, J., Appelt, D., Bear, J., Cherny, L., Moore, R., Moran, D., 1993. GEMINI: a natural language...
  • Fu, K.S., Booth, T.R., 1975. Grammatical inference: introduction and survey, Parts I and II. IEEE Trans. Systems, Man,...
  • Gao, Y., Gu, L., Kuo, H., 2005. Portability challenges in developing interactive dialogue systems. In: Proceedings of...
  • Golding, R., 1995. A Bayesian hybrid method for context-sensitive spelling correction. In: Proceedings of Third...
  • N. Gupta et al.

    The AT&T spoken language understanding system

    IEEE Transactions on Speech and Audio Processing.

    (2006)
  • He, Y., Young, S., 2003. A data-driven spoken language understanding system. In: Proceedings of IEEE ASRU Workshop, US...
  • R. Kuhn et al.

    The application of semantic classification trees to natural language understanding

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (1995)
  • Macherey, K., Och, F., Ney, H., 2001. Natural language understanding using statistical machine translation. In:...
  • McCallum, A., Nigam, K., 1998. Employing EM and pool-based active learning for text classification. In: Proceedings of...
  • H. Meng et al.

    Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries

    IEEE Transactions on Knowledge and Data Engineering

    (2002)
  • Mihalcea, R., 2004. Co-training and self-training for word sense disambiguation. In: Proceedings of the Conference on...
  • Miller, S., Bobrow, R., Ingria, R., Schwartz, R., 1994. Hidden understanding models of natural language. In:...
  • Minker, W., Bennacef, S., Gauvain, J., 1996. A stochastic case frame approach for natural language understanding. In:...
  • Muslea, I., Minton, S., Knoblock, C.A., 2002. Active+semi-supervised learning=robust multi-view learning. In:...
  • Cited by (38)

    • Question-answering dialogue system for emergency operations

      2019, International Journal of Disaster Risk Reduction
      Citation Excerpt :

      The identified domain, the detected intent, and the information from the user's input are later used in dialogue management to fulfill the user's task. There are two major of approaches to LU: rule-based approaches and data-driven approaches [17]. Rule-based approaches use parsers that are built using handcrafted semantic grammar rules.

    • A proposal for the development of adaptive spoken interfaces to access the Web

      2015, Neurocomputing
      Citation Excerpt :

      Once the speech recognizer has provided an output, the system must understand what the user said. The goal of spoken language understanding is to obtain the semantics from the recognized sentence [7]. This process generally requires morphological, lexical, syntactical, semantic, discourse and pragmatical knowledge.

    • A domain-independent statistical methodology for dialog management in spoken dialog systems

      2014, Computer Speech and Language
      Citation Excerpt :

      The goal of spoken language understanding is to obtain the semantics from the recognized sentence. This process generally requires morphological, lexical, syntactical, semantic, discourse and pragmatical knowledge (Wu et al., 2010; López-Cózar et al., 2010; Minker, 1999). The dialog manager decides the next action of the system (Traum and Larsson, 2003; Williams and Young, 2007; Griol et al., 2008), interpreting the incoming semantic representation of the user input in the context of the dialog.

    View all citing articles on Scopus

    This paper is an expanded version of two papers presented at ICSLP-2006 (Wu et al., 2006a) and EMNLP-2006 (Wu et al., 2006b). This work is supported by National Natural Science Foundation of China (NSFC, No. 60496326) and 863 Project of China (No. 2001AA114210-11).

    View full text