skip to main content
research-article

Joint Intent Detection Model for Task-oriented Human-Computer Dialogue System using Asynchronous Training

Authors Info & Claims
Published:09 May 2023Publication History
Skip Abstract Section

Abstract

How to accurately understand low-resource languages is the core of the task-oriented human-computer dialogue system. Language understanding consists of two sub-tasks, i.e., intent detection and slot filling. Intent detection still faces challenges due to semantic ambiguity and implicit intentions with users’ input. Moreover, separately modeling intent detection and slot filling significantly decrease the correctness and relevance between questions and answers. To address these issues, we propose a joint intent detection method using asynchronous training strategy. The proposed method firstly encodes local text information extracted by CNN and relationship information among words emphasized by attention structure. Later, a joint intent detection model with asynchronous training strategy is proposed by either fusing hidden states of intent detection and slot filling layers, or adopting the key information to fine-tune the whole network, greatly increasing the relevance of intent detection and slot filling subtasks. The accuracy achieved by the proposed method tested on an open-source airline travel dataset and a self-collected electricity service dataset, i.e., ATIS and ECSF, are 97.49% and 89.68%, respectively, which proves the effectiveness of joint learning and asynchronous training.

REFERENCES

  1. [1] Casanueva Iñigo, Temcinas Tadas, Gerz Daniela, Henderson Matthew, and Vulic Ivan. 2020. Efficient intent detection with dual sentence encoders. CoRR abs/2003.04807.Google ScholarGoogle Scholar
  2. [2] Chan William, Jaitly Navdeep, Le Quoc V., and Vinyals Oriol. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 49604964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chen Chen, Jiang Jiange, Zhou Yang, Lv Ning, Liang Xiaoxu, and Wan Shaohua. 2022. An edge intelligence empowered flooding process prediction using Internet of Things in smart city. J. Parallel and Distrib. Comput. 165 (2022), 6678.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Chen, Liu Lei, Wan Shaohua, Hui Xiaozhe, and Pei Qingqi. 2021. Data dissemination for Industry 4.0 applications in Internet of Vehicles based on short-term traffic prediction. ACM Transactions on Internet Technology (TOIT) 22, 1 (2021), 118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cheng Jianpeng, Dong Li, and Lapata Mirella. 2016. Long short-term memory-networks for machine reading. In Proceedings of Empirical Methods in Natural Language Processing. 551561.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Coucke Alice, Saade Alaa, Ball Adrien, Bluche Théodore, Caulier Alexandre, Leroy David, Doumouro Clément, Gisselbrecht Thibault, Caltagirone Francesco, Lavril Thibaut, Primet Maël, and Dureau Joseph. 2018. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces. CoRR abs/1805.10190.Google ScholarGoogle Scholar
  7. [7] Dopierre Thomas, Gravier Christophe, and Logerais Wilfried. 2021. ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing. In Proceedings of Association for Computational Linguistics. 24542466.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Haihong E., Niu Peiqing, Chen Zhongfu, and Song Meina. 2019. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of Association for Computational Linguistics. 54675471.Google ScholarGoogle Scholar
  9. [9] Feng Zixian, Zhu Caihai, et al. 2021. An evaluation of Chinese human-computer dialogue technology. Data Intell. 3, 2, 274286.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Gerz Daniela, Su Pei-Hao, et al. 2021. Multilingual and cross-lingual intent detection from spoken data. In Empirical Methods in Natural Language Processing. 74687475.Google ScholarGoogle Scholar
  11. [11] Goo Chih-Wen, Gao Guang, Hsu Yun-Kai, Huo Chih-Li, Chen Tsung-Chieh, Hsu Keng-Wei, and Chen Yun-Nung. 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 753757.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Guo Zhaohan Daniel, Tür Gökhan, Yih Wen-tau, and Zweig Geoffrey. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In Proceedings of IEEE Spoken Language Technology Workshop. 554559.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Hakkani-Tür Dilek, Tür Gökhan, Celikyilmaz Asli, Chen Yun-Nung, Gao Jianfeng, Deng Li, and Wang Ye-Yi. 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of Annual Conference of the International Speech Communication Association. 715719.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Jeong Minwoo and Lee Gary Geunbae. 2008. Triangular-chain conditional random fields. IEEE Trans. Speech Audio Process. 16, 7, 12871302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of Empirical Methods in Natural Language Processing. 17461751.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kurata Gakuto, Xiang Bing, Zhou Bowen, and Yu Mo. 2016. Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of Empirical Methods in Natural Language Processing. 20772083.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Lafferty John D., McCallum Andrew, and Pereira Fernando C. N.. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of International Conference on Machine Learning. 282289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Larson Stefan, Mahendran Anish, et al. 2019. An evaluation dataset for intent classification and out-of-scope prediction. In Empirical Methods in Natural Language Processing. 13111316.Google ScholarGoogle Scholar
  19. [19] Li Changliang, Li Liang, and Qi Ji. 2018. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 38243833.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Li Xin and Roth Dan. 2006. Learning question classifiers: The role of semantic information. Nat. Lang. Eng. 12, 3, 229249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Lin Jerry Chun-Wei, Shao Yinan, Djenouri Youcef, and Yun Unil. 2021. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl. Based Syst., 106548.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Bing and Lane Ian. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016).Google ScholarGoogle Scholar
  23. [23] Liu Bing and Lane Ian R.. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of International Speech Communication Association. 685689.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Liu Xingkun, Eshghi Arash, Swietojanski Pawel, and Rieser Verena. 2019. Benchmarking natural language understanding services for building conversational agents. In Proceedings of International Workshop on Spoken Dialog System Technology. 165183.Google ScholarGoogle Scholar
  25. [25] Luong Thang, Pham Hieu, and Manning Christopher D.. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of Empirical Methods in Natural Language Processing. 14121421.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Lyu Shengfei and Liu Jiaqi. 2021. Convolutional recurrent neural networks for text classification. J. Database Manag. 32, 4, 6582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Merdivan Erinc, Singh Deepika, Hanke Sten, and Holzinger Andreas. 2018. Dialogue systems for intelligent human computer interactions. In 1st Workshop on Behavioral Change and Ambient Intelligence for Sustainability and 2nd Workshop on Affective Interaction with Avatars and Robots. 5771.Google ScholarGoogle Scholar
  28. [28] Ouyang Yawen, Ye Jiasheng, Chen Yu, Dai Xinyu, Huang Shujian, and Chen Jiajun. 2021. Energy-based unknown intent detection with data manipulation. In Proceedings of Association for Computational Linguistics. 28522861.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ravuri Suman V. and Stolcke Andreas. 2015. Recurrent neural network and LSTM models for lexical utterance classification. In Proceedings of International Speech Communication Association. 135139.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Schuster Sebastian, Gupta Sonal, Shah Rushin, and Lewis Mike. 2019. Cross-lingual transfer learning for multilingual task oriented dialog. In Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 37953805.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Shao Yinan, Lin Jerry Chun-Wei, Srivastava Gautam, Jolfaei Alireza, Guo Dongdong, and Hu Yi. 2021. Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit. Lett., 157164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of Neural Information Processing Systems. 31043112.Google ScholarGoogle Scholar
  33. [33] Wang Jixuan, Wei Kai, Radfar Martin, Zhang Weiwei, and Chung Clement. 2021. Encoding syntactic knowledge in transformer encoder for intent detection and slot filling. In Proceedings of AAAI Conference on Artificial Intelligence. 1394313951.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Wang Yu. 2017. A new concept using LSTM neural networks for dynamic system identification. In Proceedings of America Control Conference. 53245329.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Wang Yu, Shen Yilin, and Jin Hongxia. 2018. A bi-model based RNN semantic frame parsing model for intent detection and slot filling. arXiv preprint arXiv:1812.10235 (2018).Google ScholarGoogle Scholar
  36. [36] Wu Yirui, Guo Haifeng, Chakraborty Chinmay, Khosravi Mohammad, Berretti Stefano, and Wan Shaohua. 2022. Edge computing driven low-light image dynamic enhancement for object detection. IEEE Transactions on Network Science and Engineering (2022).Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wu Yirui, Liu Wenxiang, and Wan Shaohua. 2021. Multiple attention encoded cascade R-CNN for scene text detection. Journal of Visual Communication and Image Representation 80 (2021), 103261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Wu Yirui, Ma Yuntao, and Wan Shaohua. 2021. Multi-scale relation reasoning for multi-modal Visual Question Answering. Signal Processing: Image Communication 96 (2021), 116319.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Xia Congying, Zhang Chenwei, Yan Xiaohui, Chang Yi, and Yu Philip S.. 2018. Zero-shot user intent detection via capsule neural networks. In Proceedings of Empirical Methods in Natural Language Processing. 30903099.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Xu Puyang and Sarikaya Ruhi. 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding. 7883.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Xu Weijia, Haider Batool, and Mansour Saab. 2020. End-to-end slot alignment and recognition for cross-lingual NLU. In Empirical Methods in Natural Language Processing. 50525063.Google ScholarGoogle Scholar
  42. [42] Yao Kaisheng, Peng Baolin, Zhang Yu, Yu Dong, Zweig Geoffrey, and Shi Yangyang. 2014. Spoken language understanding using long short-term memory neural networks. In Proceedings of IEEE Spoken Language Technology. 189194.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhan Li-Ming, Liang Haowen, Liu Bo, Fan Lu, Wu Xiao-Ming, and Lam Albert Y. S.. 2021. Out-of-scope intent detection with self-supervision and discriminative training. In Proceedings of Association for Computational Linguistics. 35213532.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhang Xiaodong and Wang Houfeng. 2016. A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of International Joint Conference on Artificial Intelligence. 29932999.Google ScholarGoogle Scholar
  45. [45] Zhu Su and Yu Kai. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 56755679.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joint Intent Detection Model for Task-oriented Human-Computer Dialogue System using Asynchronous Training

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2023
      • Online AM: 22 August 2022
      • Accepted: 9 August 2022
      • Revised: 1 August 2022
      • Received: 31 December 2021
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text