A Bi-LSTM memory network for end-to-end goal-oriented dialog learning

doi:10.1016/j.csl.2018.06.005

Computer Speech & Language

Volume 53, January 2019, Pages 217-230

https://doi.org/10.1016/j.csl.2018.06.005 Get rights and content

Highlights

•
An end-to-end goal-oriented dialog learning system is proposed.
•
Learning is conducted using a Bi-LSTM memory network model.
•
Performance increased with the use of metadata.
•
The Bi-LSTM memory network improves on the original and dynamic memory networks.

Abstract

We develop a model to satisfy the requirements of Dialog System Technology Challenge 6 (DSTC6) Track 1: building an end-to-end dialog systems for goal-oriented applications. This task involves learning a dialog policy from transactional dialogs in a given domain. Automatic system responses are generated using given task-oriented dialog data (http://workshop.colips.org/dstc6/index.html). As this task has a similar structure to a question answering task (Weston et al., 2015), we employ the MemN2N architecture (Sukhbaatar et al., 2015), which outperforms models based on recurrent neural networks or long short-term memory (LSTM). However, two problems arise when applying this model to the DSTC6 task. First, we encounter an out-of-vocabulary problem, which we resolve by categorizing the metadata types of words that exist in the knowledge base; the metadata is similar to the named entity. Second, the original memory network model has a weak ability to reflect sufficient temporal information, because it only uses sentence-level embeddings. Therefore, we add bidirectional LSTM (Bi-LSTM) at the beginning of the model to better reflect temporal information. The experimental results demonstrate that our model reflects temporal features well. Furthermore, our model achieves state-of-the-art performance among the memory networks, and is comparable to hybrid code networks (Ham et al., 2017) and hierarchical LSTM model (Bai et al., 2017) which is not an end-to-end architecture.

Introduction

The sixth Dialog System Technology Challenge (DSTC6) (Perez et al., 2017) set an end-to-end goal-oriented dialog learning task, which required considering contexts and finding the correct answer in sentence form to a question in dialog. In this task, 10 answer candidates provided in a dialog are ranked based on their probability of being correct. In the challenge, training and test sets and knowledge bases (KBs) are provided. The KB contains information that is helpful for correct prediction.

In this study, we modify a memory network that solves a question answering (QA) task (Weston et al., 2015) to perform the DSTC6 tasks. A memory network comprises a memory component and an attention mechanism. Compared to the standard long short-term memory (LSTM), the memory component allows more information to be stored, and the attention mechanism indicates where to focus in the memory component. Note that the original model chooses an answer word among all possible answer words, which is the first problem encountered in this approach, as the answer required for the DSTC6 task must be in sentence form. Thus, the proposed model chooses an answer from among all possible candidates using the method previously introduced by Bordes and Weston (2016), which we discuss in Sections 2 and 3. Further, we added an additional memory representation layer called the D_layer, because we expected that one output layer would be insufficient to represent a sentence as it is commonly used to represent a one-word answer. The added output memory layer was expected to yield a higher-precision answer sentence. However, experimental results indicated that our hypothesis was incorrect, as the models performance degraded slightly when the D_layer was employed.

We therefore added a fully connected layer, called the word feature layer, at the end of the model, which improved the model’s performance to some extent. Finally, we added bidirectional LSTM (Bi-LSTM) at the beginning of the model to store to create a better semantic and syntactic information for the sentence embeddings, instead of the D_layer and word feature layer. These changes increased the answer accuracy. Another team used a hybrid code network (Ham et al., 2017) and hierarchical LSTM model (Bai et al., 2017) that achieved the best results in the challenge; however, several additional modules should be developed for machine learning or rules. In comparison with Ham et al. (2017) and Bai et al. (2017), our model is exactly end-to-end architecture, yet achieves only a slightly lower performance. Another team used a dynamic memory network (Shin and Cha, 2017), but its performance was poor due to the use of a word-level answer prediction architecture. So, we used dynamic memory network plus (Xiong et al., 2016) for comparing to our approach. Because it shows better performance in question answering tasks (Weston et al., 2015). We modified dynamic memory network plus for DSTC6.¹

Section snippets

Introducing the DSTC6 Track 1 tasks

Track 1 of DSTC6 comprises five tasks (Perez et al., 2017). We explain them briefly below; but readers interested in more detailed descriptions are referred to Bordes and Weston (2016). Task 1 (T1) is issuing application program interface (API) calls. A user request defines a query that contains between 0 and 4 of the required fields (sampled uniformly). Fig. 1 shows an example with 3 fields. The bot should ask questions to fill in the missing fields and eventually generate the correct

Proposed model

We propose a model to appropriately solve all the DSTC6 Track 1 tasks, achieving end-to-end goal-oriented dialog learning that finds the correct answer in sentence form to a question in dialog. As noted above, we propose a model that chooses an answer sentence from among all possible candidates, which is the same approach as used in Bordes and Weston (2016). In the training set, there are many similar answer candidate utterances. We attempt to predict an answer sentence using word-level

Baseline approaches

Bordes and Weston (2016) describes four architectures appropriate for end-to-end goal-oriented dialog learning: term frequency-inverse document frequency (TF-IDF) matching, nearest neighbors, supervised embedding models (in which embeddings are trained with a margin ranking loss), and memory networks. Of these methods, TF-IDF matching performs the worst by far. The nearest neighbor and supervised embedding methods exhibit similar performance for T2 (updating API calls) and T3 (providing API

Conclusions

We have developed a model that effectively completes the tasks of DSTC6 Track 1, i.e., end-to-end goal-oriented dialog learning that finds the correct answer in sentence form to a question in dialog. We found that memory networks using metadata features yield satisfactory performance in the challenge tasks. Further, the addition of a word-feature layer yields superior performance. However, no improvement had been obtained by using an additional output representation layer (Kim et al., 2017).

Acknowledgments

This material is based upon work supported by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under the Industrial Technology Innovation Program (No. 10063424, ‘Development of distant speech recognition and multi-task dialog processing technologies for indoor conversational robots’).

References (17)

Bai, Z., Yu, B., Chen, G., Wang, B., Wang, Z., 2017. Modeling Conversations to Learn Responding Policies of E2E...
Bordes, A., Weston, J., 2016. Learning End-to-End Goal-Oriented Dialog. arXiv preprint: 1605.07683. Accepted as a...
Greff, K., Srivastava, R. K., Schmidhuber, J., 2016. Highway and Residual Networks Learn Unrolled Iterative...
Ham, J., Lim, S., Kim, K.-E., 2017. Extended Hybrid Code Networks for DSTC6 Fair Dialog Dataset. DSTC...
Huang, Z., Xu, W., Yu, K., 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint:...
Kim, B., Chung, K., Lee, J., Seo, J., Koo, M.-W., 2017. End-to-end Goal-Oriented Dialog Learning Based on Memory...
A. Kumar et al.
Ask me anything: dynamic memory networks for natural language processing
Proceedings of the 2016 International Conference on Machine Learning
(2016)
Liu, F., Baldwin, T., Cohn, T., 2017. Capturing Long-Range Contextual Dependencies with Memory-Enhanced Conditional...

There are more references available in the full text version of this article.

Cited by (18)

New mist-edge-fog-cloud system architecture for thermal error prediction and control enabled by deep-learning
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Kim et al. found that the traditional LSTM network had a weak ability to reflect the time information, and the Bi-LSTM network with strong processing timing characteristics was introduced into the dialogue system. The results suggested that the Bi-LSTM storage network had better performance than the traditional LSTM network (Kim et al., 2018). The Bi-LSTM network can store the input information from forward and backward for a long time.
The geometric precision of machined gears is reduced by thermal errors. So the prediction and control of thermal errors are essential. But the prediction and control are a process involving the processing of a large-volume thermal data, and then the processing efficiency is low, which severely hinders the geometric precision improvement. To solve this problem, a new mist-edge-fog-cloud system (MEFCS) architecture is proposed for the error prediction and control. A finite element model is established to prove the applicability of bidirectional long short-term memory (Bi-LSTM) network. A cosine and sine gray wolf optimization (CSGWO) algorithm is proposed to optimize the batch size. Then the CSGWO-Bi-LSTM network error model is proposed. The predictive accuracy is 90.80%, 94.57%, 95.77%, 96.79%, 97.51%, 98.45%, and 98.92% for the multiple linear regression model, recurrent neural network, LSTM network, Bi-LSTM network, CSGWO1-Bi-LSTM network, CSGWO2-Bi-LSTM network, and CSGWO3-Bi-LSTM network, respectively. The volume of the transferred data is reduced by 11/16 with the data-based model, and the volume of the transferred thermal data is reduced to 1/10 with the designed system. A precision threshold is set, and the predictive accuracy is improved by 8.31% by the system with the precision threshold compared with the system without the precision threshold. With the proposed MEFCS, the accuracy level of the tooth profile deviation $f_{H α}$ is increased from ISO level 5 to ISO level 3. The total execution time of the mist-cloud structure, mist-edge-cloud structure, mist-fog-cloud structure, and mist-edge-fog-cloud structure is 206 s, 200 s, 186 s, and 167 s, respectively.
Simultaneous geometric and thermal error control of gear profile grinder based on analytical correlation between tooth surface error and position error of grinding wheel/workpiece
2022, Mechanism and Machine Theory
Citation Excerpt :
Then the Bi-LSTMN, which had a strong ability to process the timing characteristics, was introduced into the dialogue system. The storage performance of Bi-LSTMN is better than that of traditional LSTMN because the bidirectional-LSTMN (Bi-LSTMN) allows the error information to transfer in bidirectional [25]. Then the predictive accuracy is improved, and the Bi-LSTMN is widely used due to its ability to detect the future information.
The gear profile grinder is important to ensure the geometric accuracy of tooth surfaces. However, its machining accuracy is significantly reduced by the geometric and thermal errors. So the geometric and thermal errors should be controlled simultaneously. The geometric error modeling and decoupling of the gear profile grinder are studied, and the mapping relationship between the geometric accuracy of tooth surfaces and machine errors is constructed. To establish a robust thermal error model, a physically-based model is established to prove the feasibility of using the bidirectional long short term memory network (Bi-LSTMN) to establish a self-learning error model. The high-frequency noises are filtered and singular values are reduced by the Savitzky–Golay (SG) filtering algorithm. An attention-gate-based Bi-LSTMN (AGBi-LSTMN) is proposed to reduce the time complexity. Then the mixed-variants whale optimization algorithm (MVWOA) is proposed to optimize the hyper-parameters, and SG-MVWOA-AGBi-LSTMN is proposed for the thermal error prediction. Finally, the simultaneous geometric and thermal error control is implemented. The results show that the total tooth profile precision is increased by ISO level 3, and the tooth profile tilt precision is increased by ISO level 5 with the simultaneous geometric and thermal error control.
Sentiment Analysis with An Integrated Model of BERT and Bi-LSTM Based on Multi-Head Attention Mechanism
2023, IAENG International Journal of Computer Science
A Novel Approach for Building Domain-Specific Chatbots by Exploring Sentence Transformers-based Encoding
2023, 2023 International Conference on IT and Industrial Technologies, ICIT 2023
A Unified Dialogue Framework Based on Joint Generation of Dialogue Acts and Responses
2022, Research Square
TRGM: Generating Informative Responses for Open Domain Dialogue Systems
2022, Journal of Information Science and Engineering

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Roger Moore.

View full text

A Bi-LSTM memory network for end-to-end goal-oriented dialog learning☆