Abstract
Dialog state tracking (DST) maintains and updates dialog states at each time step as the dialog progresses. It is necessary to include dialog historical information in DST. Previous word-based DST models took historical utterances as a word sequence and used n-grams in the sequence as inputs of models. It suffered from the problem of data sparseness. This paper proposes a cascaded deep neural network framework for DST. It alleviates the problem of data sparseness by making use of the hierarchical structure in dialog. The bottom layer of the cascaded framework, implemented by an Long Short Term Memory (LSTM) or a Convolutional Neural Network (CNN), encodes the word sequence into a sentence embedding in each dialog turn, and the upper layer integrates the representation of each turn gradually to get the dialog state using an LSTM. The cascaded models integrate natural language understanding into DST, and the entire network is trained as a whole. The experimental results on the DSTC2 dataset indicate that the proposed models, LSTM+LSTM and CNN + LSTM, can achieve better performance than existing models.
Similar content being viewed by others
References
Henderson M, Thomson B, Williams JD (2013) The second dialog state tracking challenge. Proc 15th Annu Meet Spec Interest Group Discourse Dialogue (SIGDIAL):263–272
Henderson M, Thomson B, Williams JD (2014) The Third Dialog State Tracking Challenge. In Proceedings of IEEE Spoken Language Technology Workshop (SLT)
Henderson M, Thomson B, Young S (2014) Word-based dialog state tracking with recurrent neural networks. Proc 15th Annu Meet Spec Interest Group Discourse Dialogue (SIGDIAL) 36(4):292–299
Jang Y, Ham J, Lee B, Chang Y, Kim KE (2017) Neural dialog state tracker for large ontologies by attention mechanism. Proc IEEE Spoken Lang Technol Workshop (SLT): 531–537
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. Proc 52nd Annu Meet Assoc Comput Ling: 655–665
Kim S, D’Haro LF, Banchs RE, Williams J, Henderson M (2017) The fourth dialog state tracking challenge. AI Mag 35(4):121–124
Kim S, D’Haro LF, Banchs RE, Henderson M, Williams J, Yoshino K (2017) The fifth dialog state tracking challenge. Spoken Lang Technol Workshop 35(4):324–329
Lee BJ, Kim KE (2016) Dialog history construction with long-short term memory for robust generative dialog state tracking. Dial Disc 7(3):47–64
Mrksic N, Kadlex R, Thomson B et al (2015) Multi-domain dialog state tracking using recurrent neural networks. Comput Sci: 794–799
Mrkšić N, Séaghdha DÓ, Wen TH, Thomson B, Young S, (2017) Neural belief tracker: data-driven dialogue state tracking. Proc 55th Annu Meet Assoc Comput Ling: 1777–1788
Shi H, Ushio T, Endo M, Yamagami K, Horii N (2017) Convolutional neural networks for multi-topic dialog state tracking. Dial Soc Robots Lect Notes Elect Eng 427:451–463
Shi H, Ushio T, Endo M, Yamagami K, Horii N (2017) A multichannel convolutional neural network for cross-language dialog state tracking. Proc IEEE Spoken Lang Technol Workshop (SLT): 559–564
Sun K, Chen L, Zhu S, Yu K (2014) The SJTU system for dialog state tracking challenge 2. Proc SIGDIAL 2014 Conf: 318–326
Vodolán M, Kadlec R, Kleindienst J (2016) Hybrid dialog state tracker. Comput Sci
Vodolán M, Kadlec R, Kleindienst J (2017) Hybrid dialog state tracker with ASR features. Proc 15th Conf Eur Chap Assoc Comput Ling 2:205–210
Williams JD Challenges and opportunities for state tracking in statistical spoken dialog systems: results from two public deployment. J Select Topics Sign Process 6(8):959–970
Williams JD, Poupart P, Young S (2005) Factored partially observable Markov decision processes for dialogue management. Proc Workshop Knowl Reason Pract Dial Syst Int Joint Conf Artif Intell (IJCAI):76–82
Williams J, Raux A, Ramachandran D, Black A (2013) The dialog state tracking challenge. Proc SIGDIAL 2013 Conf: 404–413
Williams J, Raux A, Henderson M (2016) The dialog state tracking challenge series: a review. Dialogue Disc 7(3):4–33
Yang X, Liu J (2015) Dialog state tracking using long short-term memory neural networks. ISCA: 1800–1804
Young S, Gasic M, Thomson B, Williams JD (2013) POMDP-based statistical spoken dialogue systems: a review. Proc IEEE 101(5):1160–1179
Acknowledgements
This paper is supported by the 111 Project (no. B08004), the NSFC (no. 61273365), the Beijing Advanced Innovation Center for Imaging Technology, the Engineering Research Center of Information Networks of MOE, and ZTE.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, G., Wang, X. Cascaded deep neural network models for dialog state tracking. Multimed Tools Appl 78, 9625–9643 (2019). https://doi.org/10.1007/s11042-018-6531-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6531-2