Elsevier

Computer Speech & Language

Volume 53, January 2019, Pages 217-230
Computer Speech & Language

A Bi-LSTM memory network for end-to-end goal-oriented dialog learning

https://doi.org/10.1016/j.csl.2018.06.005Get rights and content

Highlights

  • An end-to-end goal-oriented dialog learning system is proposed.

  • Learning is conducted using a Bi-LSTM memory network model.

  • Performance increased with the use of metadata.

  • The Bi-LSTM memory network improves on the original and dynamic memory networks.

Abstract

We develop a model to satisfy the requirements of Dialog System Technology Challenge 6 (DSTC6) Track 1: building an end-to-end dialog systems for goal-oriented applications. This task involves learning a dialog policy from transactional dialogs in a given domain. Automatic system responses are generated using given task-oriented dialog data (http://workshop.colips.org/dstc6/index.html). As this task has a similar structure to a question answering task (Weston et al., 2015), we employ the MemN2N architecture (Sukhbaatar et al., 2015), which outperforms models based on recurrent neural networks or long short-term memory (LSTM). However, two problems arise when applying this model to the DSTC6 task. First, we encounter an out-of-vocabulary problem, which we resolve by categorizing the metadata types of words that exist in the knowledge base; the metadata is similar to the named entity. Second, the original memory network model has a weak ability to reflect sufficient temporal information, because it only uses sentence-level embeddings. Therefore, we add bidirectional LSTM (Bi-LSTM) at the beginning of the model to better reflect temporal information. The experimental results demonstrate that our model reflects temporal features well. Furthermore, our model achieves state-of-the-art performance among the memory networks, and is comparable to hybrid code networks (Ham et al., 2017) and hierarchical LSTM model (Bai et al., 2017) which is not an end-to-end architecture.

Introduction

The sixth Dialog System Technology Challenge (DSTC6) (Perez et al., 2017) set an end-to-end goal-oriented dialog learning task, which required considering contexts and finding the correct answer in sentence form to a question in dialog. In this task, 10 answer candidates provided in a dialog are ranked based on their probability of being correct. In the challenge, training and test sets and knowledge bases (KBs) are provided. The KB contains information that is helpful for correct prediction.

In this study, we modify a memory network that solves a question answering (QA) task (Weston et al., 2015) to perform the DSTC6 tasks. A memory network comprises a memory component and an attention mechanism. Compared to the standard long short-term memory (LSTM), the memory component allows more information to be stored, and the attention mechanism indicates where to focus in the memory component. Note that the original model chooses an answer word among all possible answer words, which is the first problem encountered in this approach, as the answer required for the DSTC6 task must be in sentence form. Thus, the proposed model chooses an answer from among all possible candidates using the method previously introduced by Bordes and Weston (2016), which we discuss in Sections 2 and 3. Further, we added an additional memory representation layer called the D_layer, because we expected that one output layer would be insufficient to represent a sentence as it is commonly used to represent a one-word answer. The added output memory layer was expected to yield a higher-precision answer sentence. However, experimental results indicated that our hypothesis was incorrect, as the models performance degraded slightly when the D_layer was employed.

We therefore added a fully connected layer, called the word feature layer, at the end of the model, which improved the model’s performance to some extent. Finally, we added bidirectional LSTM (Bi-LSTM) at the beginning of the model to store to create a better semantic and syntactic information for the sentence embeddings, instead of the D_layer and word feature layer. These changes increased the answer accuracy. Another team used a hybrid code network (Ham et al., 2017) and hierarchical LSTM model (Bai et al., 2017) that achieved the best results in the challenge; however, several additional modules should be developed for machine learning or rules. In comparison with Ham et al. (2017) and Bai et al. (2017), our model is exactly end-to-end architecture, yet achieves only a slightly lower performance. Another team used a dynamic memory network (Shin and Cha, 2017), but its performance was poor due to the use of a word-level answer prediction architecture. So, we used dynamic memory network plus (Xiong et al., 2016) for comparing to our approach. Because it shows better performance in question answering tasks (Weston et al., 2015). We modified dynamic memory network plus for DSTC6.1

Section snippets

Introducing the DSTC6 Track 1 tasks

Track 1 of DSTC6 comprises five tasks (Perez et al., 2017). We explain them briefly below; but readers interested in more detailed descriptions are referred to Bordes and Weston (2016). Task 1 (T1) is issuing application program interface (API) calls. A user request defines a query that contains between 0 and 4 of the required fields (sampled uniformly). Fig. 1 shows an example with 3 fields. The bot should ask questions to fill in the missing fields and eventually generate the correct

Proposed model

We propose a model to appropriately solve all the DSTC6 Track 1 tasks, achieving end-to-end goal-oriented dialog learning that finds the correct answer in sentence form to a question in dialog. As noted above, we propose a model that chooses an answer sentence from among all possible candidates, which is the same approach as used in Bordes and Weston (2016). In the training set, there are many similar answer candidate utterances. We attempt to predict an answer sentence using word-level

Baseline approaches

Bordes and Weston (2016) describes four architectures appropriate for end-to-end goal-oriented dialog learning: term frequency-inverse document frequency (TF-IDF) matching, nearest neighbors, supervised embedding models (in which embeddings are trained with a margin ranking loss), and memory networks. Of these methods, TF-IDF matching performs the worst by far. The nearest neighbor and supervised embedding methods exhibit similar performance for T2 (updating API calls) and T3 (providing API

Conclusions

We have developed a model that effectively completes the tasks of DSTC6 Track 1, i.e., end-to-end goal-oriented dialog learning that finds the correct answer in sentence form to a question in dialog. We found that memory networks using metadata features yield satisfactory performance in the challenge tasks. Further, the addition of a word-feature layer yields superior performance. However, no improvement had been obtained by using an additional output representation layer (Kim et al., 2017).

Acknowledgments

This material is based upon work supported by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under the Industrial Technology Innovation Program (No. 10063424, ‘Development of distant speech recognition and multi-task dialog processing technologies for indoor conversational robots’).

References (17)

  • Bai, Z., Yu, B., Chen, G., Wang, B., Wang, Z., 2017. Modeling Conversations to Learn Responding Policies of E2E...
  • Bordes, A., Weston, J., 2016. Learning End-to-End Goal-Oriented Dialog. arXiv preprint: 1605.07683. Accepted as a...
  • Greff, K., Srivastava, R. K., Schmidhuber, J., 2016. Highway and Residual Networks Learn Unrolled Iterative...
  • Ham, J., Lim, S., Kim, K.-E., 2017. Extended Hybrid Code Networks for DSTC6 Fair Dialog Dataset. DSTC...
  • Huang, Z., Xu, W., Yu, K., 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint:...
  • Kim, B., Chung, K., Lee, J., Seo, J., Koo, M.-W., 2017. End-to-end Goal-Oriented Dialog Learning Based on Memory...
  • A. Kumar et al.

    Ask me anything: dynamic memory networks for natural language processing

    Proceedings of the 2016 International Conference on Machine Learning

    (2016)
  • Liu, F., Baldwin, T., Cohn, T., 2017. Capturing Long-Range Contextual Dependencies with Memory-Enhanced Conditional...
There are more references available in the full text version of this article.

Cited by (18)

  • New mist-edge-fog-cloud system architecture for thermal error prediction and control enabled by deep-learning

    2022, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    Kim et al. found that the traditional LSTM network had a weak ability to reflect the time information, and the Bi-LSTM network with strong processing timing characteristics was introduced into the dialogue system. The results suggested that the Bi-LSTM storage network had better performance than the traditional LSTM network (Kim et al., 2018). The Bi-LSTM network can store the input information from forward and backward for a long time.

  • Simultaneous geometric and thermal error control of gear profile grinder based on analytical correlation between tooth surface error and position error of grinding wheel/workpiece

    2022, Mechanism and Machine Theory
    Citation Excerpt :

    Then the Bi-LSTMN, which had a strong ability to process the timing characteristics, was introduced into the dialogue system. The storage performance of Bi-LSTMN is better than that of traditional LSTMN because the bidirectional-LSTMN (Bi-LSTMN) allows the error information to transfer in bidirectional [25]. Then the predictive accuracy is improved, and the Bi-LSTMN is widely used due to its ability to detect the future information.

  • A Novel Approach for Building Domain-Specific Chatbots by Exploring Sentence Transformers-based Encoding

    2023, 2023 International Conference on IT and Industrial Technologies, ICIT 2023
  • TRGM: Generating Informative Responses for Open Domain Dialogue Systems

    2022, Journal of Information Science and Engineering
View all citing articles on Scopus

This paper has been recommended for acceptance by Roger Moore.

View full text