Spoken language understanding using weakly supervised learning☆
Introduction
Spoken dialogue systems have been attracting extensive interests from the research and industrial communities since they provide a natural interface between human and computer, which has such potential benefits as remote or hands-free access, ease of use, naturalness, and greater efficiency of interaction (Walker et al., 1997). In recent years, many spoken dialogue systems have appeared in a variety of application domains, including customer service (Price, 1990), information inquiring (Blomberg et al., 1993, Lamel et al., 1997, Zue et al., 2000), call routing (Gorin et al., 1997), planning (Allen et al., 1995), etc. The success of spoken dialogue systems relies on the correct recognition of not only what is said, which is achieved by Automatic Speech Recognition (ASR), but also what is meant, which is accomplished by Spoken Language Understanding (SLU) (Wang et al., 2005). SLU is one of the key components in a spoken dialogue system. Its task is to identify the user’s goal and extract from the input utterance the information needed to complete the query.
Although it focus only on limited domains, SLU still faces great challenges. One challenge is the robustness problem. In addition to the difficulties intrinsic to natural language processing, the speech recognizer inevitably makes errors. Also, spoken language is plagued with a large set of spontaneous speech phenomena such as false start, self-correction, repetitions and hesitations, ellipsis, out-of-order structures and so on. Thus, the performance of the SLU component should deteriorate gracefully when the input utterances are ill-formed. Another challenge is the portability problem, which relates to how flexible new SLU components for new applications or new languages can be built quickly at a reasonable cost (Gao et al., 2005). Currently, the development of spoken language systems relies often heavily on human works, which has been one of the main bottlenecks for rapid development of spoken dialogue systems. The rule-based SLU approaches require the linguistic experts to handcraft the domain-specific grammar for parsing, which is a time-consuming, laboursome and error-prone task. On the other hand, although requiring very little a-priori knowledge handcrafted by the linguistic experts, the general statistical SLU approaches need a large amount of labeled data to achieve reasonable performance. These drawbacks prevent the efficient portability of the SLU component to new domains and languages.
In this paper, we try to propose a robust and portable approach for spoken language understanding in hope that this approach has such desirable properties as follows:
- •
It should have good robustness for ill-formed spoken utterances while keeping the understanding deepness.
- •
It may be basically data-driven and requires only minimally annotated data for training. Therefore, it can be easily portable across different domains and languages.
- •
It can be trained using the weakly supervised learning approaches and hence further reduce the cost of labeling training utterances.
The remainder of this paper is organized as follows. The next section introduces the related works about SLU. Section 3 presents our SLU framework and describes its components in depth. Section 4 focuses on the weakly supervised training approaches for our SLU framework. Section 5 gives the experimental setup and results. Finally, Section 6 concludes the paper and gives the future works.
Section snippets
Related works
Generally, there are two mainstreams in the SLU research: rule-based approaches and data-driven approaches. These two kinds of approaches also can be combined.
General knowledge source for SLU
In order to enable a spoken dialogue system to support a conversation between a human and an information back-end, it is important to model the semantic structure of the corresponding application domain. Usually, The semantic structure of an application domain is defined in terms of a set of semantic frames, which is often called domain model. A semantic frame contains a frame type representing the topic of the input sentence, and some slots representing the constraints the query goal has to
Weakly supervised training for two-stage classification based SLU
As stated before, to train the classifiers for topic identification and slot-filling, we need to label each sentence in the training corpus against the semantic frame. Although this annotating scenario is relatively minimal, the labeling work is still time-consuming and costly. Meanwhile unlabeled sentences are relatively easy to collect. Therefore, to reduce the cost of labeling training utterances, we investigate weakly supervised techniques for training the topic and slot classifiers.
Data collection and experimental setting
Our experiments were carried out in two corpora. One is a Chinese corpus in the context of public transportation information inquiry domain. The other is the English DARPA Communicator Travel Data (Communicator Travel Data, 2004), which is related to air travel, hotel reservation, car rental, etc.
We collected two kinds of corpora for the Chinese transportation information inquiry domain in different ways. Firstly, a natural language corpus was collected through a specific website which
Conclusion and future work
We have presented a new SLU framework based on two-staged classification. The proposed framework exhibits the advantages as follows.
- •
It has good robustness on processing spoken language: (1) the preprocessor provides low level robustness. (2) It inherits the robustness of topic classification using statistical pattern recognition techniques. It can also make use of topic classification to guide slot filling. (3) The strategy of first finding the concept or slot islands and then linking them is
References (58)
- et al.
How may I help you
Speech Communication
(1997) - et al.
Semantic processing using the hidden vector state model
Computer Speech and Language
(2005) - et al.
The LIMSI RailTel system: field trial of a telephone service for rail travel Information
Speech Communication
(1997) - et al.
Combining active and semi-supervised learning for spoken language understanding
Speech Communication
(2005) - Abney, S., 2002. Bootstrapping. In: Proceedings of ACL, Philadelphia, PA, pp....
The TRAINS project: a case study in building a conversational planning agent
Journal of Experimental and Theoretical Artificial Intelligence
(1995)- Blomberg, M., Carlson, R., Elenius, K, Granstrom, B, Gustafson, J., Hunnicutt, S., Lindell, R., Neovius, L., 1993. An...
- Blum, A., Mitchell, T., 1998. Combining labeled and unlabeled data with co-training. In: Proceedings of COLT, Madison,...
- Communicator Travel Data, 2004. University of Corlorado at Boulder, URL:...
- CU Pheonix Parser, 2003. University of Colorado at Boulder, URL:...
Improving generalization with active learning
Machine Learning
The AT&T spoken language understanding system
IEEE Transactions on Speech and Audio Processing.
The application of semantic classification trees to natural language understanding
IEEE Transaction on Pattern Analysis and Machine Intelligence
Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries
IEEE Transactions on Knowledge and Data Engineering
Cited by (38)
Weakly semi-supervised classification of transcranial doppler ultrasound signal for ischemic stroke detection
2021, Procedia Computer ScienceQuestion-answering dialogue system for emergency operations
2019, International Journal of Disaster Risk ReductionCitation Excerpt :The identified domain, the detected intent, and the information from the user's input are later used in dialogue management to fulfill the user's task. There are two major of approaches to LU: rule-based approaches and data-driven approaches [17]. Rule-based approaches use parsers that are built using handcrafted semantic grammar rules.
A proposal for the development of adaptive spoken interfaces to access the Web
2015, NeurocomputingCitation Excerpt :Once the speech recognizer has provided an output, the system must understand what the user said. The goal of spoken language understanding is to obtain the semantics from the recognized sentence [7]. This process generally requires morphological, lexical, syntactical, semantic, discourse and pragmatical knowledge.
A domain-independent statistical methodology for dialog management in spoken dialog systems
2014, Computer Speech and LanguageCitation Excerpt :The goal of spoken language understanding is to obtain the semantics from the recognized sentence. This process generally requires morphological, lexical, syntactical, semantic, discourse and pragmatical knowledge (Wu et al., 2010; López-Cózar et al., 2010; Minker, 1999). The dialog manager decides the next action of the system (Traum and Larsson, 2003; Williams and Young, 2007; Griol et al., 2008), interpreting the incoming semantic representation of the user input in the context of the dialog.
Spoken Language Identification Using Prosody, Phonotactics, and Acoustics: A Review
2022, Journal of Information and Knowledge ManagementA review into deep learning techniques for spoken language identification
2022, Multimedia Tools and Applications
- ☆
This paper is an expanded version of two papers presented at ICSLP-2006 (Wu et al., 2006a) and EMNLP-2006 (Wu et al., 2006b). This work is supported by National Natural Science Foundation of China (NSFC, No. 60496326) and 863 Project of China (No. 2001AA114210-11).