Abstract
Slot filling and intent prediction are basic tasks in capturing semantic frame of human utterances. Slots and intent have strong correlation for semantic frame parsing. For each utterance, a specific intent type is generally determined with the indication information of words having slot tags (called as slot words), and in reverse the intent type decides that words of certain categories should be used to fill as slots. However, the Intent-Slot correlation is rarely modeled explicitly in existing studies, and hence may be not fully exploited. In this paper, we model Intent-Slot correlation explicitly and propose a new framework for joint intent prediction and slot filling. Firstly, we explore the effects of slot words on intent by differentiating them from the other words, and we recognize slot words by solving a sequence labeling task with the bi-directional long short-term memory (BiLSTM) model. Then, slot recognition information is introduced into attention- based intent prediction and slot filling to improve semantic results. In addition, we integrate the Slot-Gated mechanism into slot filling to model dependency of slots on intent. Finally, we obtain slot recognition, intent prediction and slot filling by training with joint optimization. Experimental results on the benchmark Air-line Travel Information System (ATIS) and Snips datasets show that our Intent-Slot correlation model achieves state-of-the-art semantic frame performance with a lightweight structure.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Tur G, De Mori R. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. John Wiley & Sons, 2011. DOI: https://doi.org/10.1002/9781119992691.
Haffner P, Tür G, Wright J H. Optimizing SVMs for complex call classification. In Proc. the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, pp.632-635. DOI: 10.1109/ICASSP.2003.1198860.
Hu J, Wang G, Lochovsky F, Sun J T, Chen Z. Understanding user’s query intent with Wikipedia. In Proc. the 18th International Conference on World Wide Web, April 2009, pp.471-480. DOI: 10.1145/1526709.1526773.
Sarikaya R, Hinton G E, Ramabhadran B. Deep belief nets for natural language call-routing. In Proc. the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2011, pp.5680-5683. DOI: 10.1109/ICASSP.2011.5947649.
Raymond C, Riccardi G. Generative and discriminative algorithms for spoken language understanding. In Proc. the 8th Annual Conference of the International Speech Communication Association, August 2007, pp.1605-1608.
Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y. Spoken language understanding using long short-term memory neural networks. In Proc. the 2014 IEEE Spoken Language Technology Workshop, Dec. 2014, pp.189-194. DOI: https://doi.org/10.1109/SLT.2014.7078572.
Guo D, Tur G, Yih W T, Zweig G. Joint semantic utterance classification and slot filling with recursive neural networks. In Proc. the 2014 IEEE Spoken Language Technology Workshop, Dec. 2014, pp.554-559. DOI: https://doi.org/10.1109/SLT.2014.7078634.
Hakkani-Tür D, Tür G, Celikyilmaz A, Chen Y N, Gao J, Deng L, Wang Y Y. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proc. the 17th Annual Conference of the International Speech Communication Association, Sept. 2016, pp.715-719. DOI: 10.21437/Interspeech.2016-402.
Chen Y N, Hakanni-Tür D, Tur G, Celikyilmaz A, Guo J, Deng L. Syntax or semantics? Knowledge-guided joint semantic frame parsing. In Proc. the 2016 IEEE Spoken Language Technology Workshop, Dec. 2016, pp.348-355. DOI: https://doi.org/10.1109/SLT.2016.7846288.
Wang Y, Shen Y, Jin H. A bi-model based RNN semantic frame parsing model for intent detection and slot filling. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.309-314. DOI: 10.18653/v1/N18-2050.
Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling. In Proc. the 17th Annual Conference of the International Speech Communication Association, Sept. 2016, pp.685-689. DOI: 10.21437/Interspeech.2016-1352.
Goo C W, Gao G, Hsu Y K, Huo C L, Chen T C, Hsu K W, Chen Y N. Slot-gated modeling for joint slot filling and intent prediction. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.753-757. DOI: 10.18653/v1/N18-2118.
Liu B, Lane I. Recurrent neural network structured output prediction for spoken language understanding. In Proc. NIPS Workshop on Machine Learning for Spoken Language Understanding and Interactions, Dec. 2015.
Chen Q, Zhuo Z, Wang W. BERT for joint intent classification and slot filling. arXiv:1902.10909, 2019. https://arxiv.org/abs/1902.10909, August 2020.
Li C, Li L, Qi J. A self-attentive model with gate mechanism for spoken language understanding. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.3824-3833. DOI: 10.18653/v1/D18-1417.
Zhang C, Li Y, Du N, Fan W, Philip S Y. Joint slot filling and intent detection via capsule neural networks. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, July 28-August 2, 2019, pp.5259-5267. DOI: 10.18653/v1/P19-1519.
E H H, Niu P, Chen Z, Song M. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, July 28-August 2, 2019, pp.5467-5471. DOI: 10.18653/v1/P19-1544.
Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. DOI: https://doi.org/10.1109/78.650093.
Tur G, Hakkani-Tür D, Heck L. What is left to be understood in ATIS? In Proc. the 2010 IEEE Spoken Language Technology Workshop, Dec. 2010, pp.19-24. DOI: 10.1109/SLT.2010.5700816.
Coucke A, Saade A, Ball A et al. Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190, 2018. https://arxiv.org/abs/1805.10190, August 2020.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 462 kb)
Rights and permissions
About this article
Cite this article
Fan, JF., Wang, ML., Li, CL. et al. Intent-Slot Correlation Modeling for Joint Intent Prediction and Slot Filling. J. Comput. Sci. Technol. 37, 309–319 (2022). https://doi.org/10.1007/s11390-020-0326-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-020-0326-4