Abstract
How to accurately understand low-resource languages is the core of the task-oriented human-computer dialogue system. Language understanding consists of two sub-tasks, i.e., intent detection and slot filling. Intent detection still faces challenges due to semantic ambiguity and implicit intentions with users’ input. Moreover, separately modeling intent detection and slot filling significantly decrease the correctness and relevance between questions and answers. To address these issues, we propose a joint intent detection method using asynchronous training strategy. The proposed method firstly encodes local text information extracted by CNN and relationship information among words emphasized by attention structure. Later, a joint intent detection model with asynchronous training strategy is proposed by either fusing hidden states of intent detection and slot filling layers, or adopting the key information to fine-tune the whole network, greatly increasing the relevance of intent detection and slot filling subtasks. The accuracy achieved by the proposed method tested on an open-source airline travel dataset and a self-collected electricity service dataset, i.e., ATIS and ECSF, are 97.49% and 89.68%, respectively, which proves the effectiveness of joint learning and asynchronous training.
- [1] . 2020. Efficient intent detection with dual sentence encoders. CoRR abs/2003.04807.Google Scholar
- [2] . 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 4960–4964.Google ScholarDigital Library
- [3] . 2022. An edge intelligence empowered flooding process prediction using Internet of Things in smart city. J. Parallel and Distrib. Comput. 165 (2022), 66–78.Google ScholarCross Ref
- [4] . 2021. Data dissemination for Industry 4.0 applications in Internet of Vehicles based on short-term traffic prediction. ACM Transactions on Internet Technology (TOIT) 22, 1 (2021), 1–18.Google ScholarDigital Library
- [5] . 2016. Long short-term memory-networks for machine reading. In Proceedings of Empirical Methods in Natural Language Processing. 551–561.Google ScholarCross Ref
- [6] . 2018. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces. CoRR abs/1805.10190.Google Scholar
- [7] . 2021. ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing. In Proceedings of Association for Computational Linguistics. 2454–2466.Google ScholarCross Ref
- [8] . 2019. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of Association for Computational Linguistics. 5467–5471.Google Scholar
- [9] . 2021. An evaluation of Chinese human-computer dialogue technology. Data Intell. 3, 2, 274–286.Google ScholarCross Ref
- [10] . 2021. Multilingual and cross-lingual intent detection from spoken data. In Empirical Methods in Natural Language Processing. 7468–7475.Google Scholar
- [11] . 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 753–757.Google ScholarCross Ref
- [12] . 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In Proceedings of IEEE Spoken Language Technology Workshop. 554–559.Google ScholarCross Ref
- [13] . 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of Annual Conference of the International Speech Communication Association. 715–719.Google ScholarCross Ref
- [14] . 2008. Triangular-chain conditional random fields. IEEE Trans. Speech Audio Process. 16, 7, 1287–1302.Google ScholarDigital Library
- [15] . 2014. Convolutional neural networks for sentence classification. In Proceedings of Empirical Methods in Natural Language Processing. 1746–1751.Google ScholarCross Ref
- [16] . 2016. Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of Empirical Methods in Natural Language Processing. 2077–2083.Google ScholarCross Ref
- [17] . 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of International Conference on Machine Learning. 282–289.Google ScholarDigital Library
- [18] . 2019. An evaluation dataset for intent classification and out-of-scope prediction. In Empirical Methods in Natural Language Processing. 1311–1316.Google Scholar
- [19] . 2018. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3824–3833.Google ScholarCross Ref
- [20] . 2006. Learning question classifiers: The role of semantic information. Nat. Lang. Eng. 12, 3, 229–249.Google ScholarDigital Library
- [21] . 2021. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl. Based Syst., 106548.Google ScholarCross Ref
- [22] . 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016).Google Scholar
- [23] . 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of International Speech Communication Association. 685–689.Google ScholarCross Ref
- [24] . 2019. Benchmarking natural language understanding services for building conversational agents. In Proceedings of International Workshop on Spoken Dialog System Technology. 165–183.Google Scholar
- [25] . 2015. Effective approaches to attention-based neural machine translation. In Proceedings of Empirical Methods in Natural Language Processing. 1412–1421.Google ScholarCross Ref
- [26] . 2021. Convolutional recurrent neural networks for text classification. J. Database Manag. 32, 4, 65–82.Google ScholarDigital Library
- [27] . 2018. Dialogue systems for intelligent human computer interactions. In 1st Workshop on Behavioral Change and Ambient Intelligence for Sustainability and 2nd Workshop on Affective Interaction with Avatars and Robots. 57–71.Google Scholar
- [28] . 2021. Energy-based unknown intent detection with data manipulation. In Proceedings of Association for Computational Linguistics. 2852–2861.Google ScholarCross Ref
- [29] . 2015. Recurrent neural network and LSTM models for lexical utterance classification. In Proceedings of International Speech Communication Association. 135–139.Google ScholarCross Ref
- [30] . 2019. Cross-lingual transfer learning for multilingual task oriented dialog. In Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3795–3805.Google ScholarCross Ref
- [31] . 2021. Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit. Lett., 157–164.Google ScholarDigital Library
- [32] . 2014. Sequence to sequence learning with neural networks. In Proceedings of Neural Information Processing Systems. 3104–3112.Google Scholar
- [33] . 2021. Encoding syntactic knowledge in transformer encoder for intent detection and slot filling. In Proceedings of AAAI Conference on Artificial Intelligence. 13943–13951.Google ScholarCross Ref
- [34] . 2017. A new concept using LSTM neural networks for dynamic system identification. In Proceedings of America Control Conference. 5324–5329.Google ScholarCross Ref
- [35] . 2018. A bi-model based RNN semantic frame parsing model for intent detection and slot filling. arXiv preprint arXiv:1812.10235 (2018).Google Scholar
- [36] . 2022. Edge computing driven low-light image dynamic enhancement for object detection. IEEE Transactions on Network Science and Engineering (2022).Google ScholarCross Ref
- [37] . 2021. Multiple attention encoded cascade R-CNN for scene text detection. Journal of Visual Communication and Image Representation 80 (2021), 103261.Google ScholarDigital Library
- [38] . 2021. Multi-scale relation reasoning for multi-modal Visual Question Answering. Signal Processing: Image Communication 96 (2021), 116319.Google ScholarCross Ref
- [39] . 2018. Zero-shot user intent detection via capsule neural networks. In Proceedings of Empirical Methods in Natural Language Processing. 3090–3099.Google ScholarCross Ref
- [40] . 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding. 78–83.Google ScholarCross Ref
- [41] . 2020. End-to-end slot alignment and recognition for cross-lingual NLU. In Empirical Methods in Natural Language Processing. 5052–5063.Google Scholar
- [42] . 2014. Spoken language understanding using long short-term memory neural networks. In Proceedings of IEEE Spoken Language Technology. 189–194.Google ScholarCross Ref
- [43] . 2021. Out-of-scope intent detection with self-supervision and discriminative training. In Proceedings of Association for Computational Linguistics. 3521–3532.Google ScholarCross Ref
- [44] . 2016. A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of International Joint Conference on Artificial Intelligence. 2993–2999.Google Scholar
- [45] . 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 5675–5679.Google ScholarDigital Library
Index Terms
- Joint Intent Detection Model for Task-oriented Human-Computer Dialogue System using Asynchronous Training
Recommendations
Historical Information-Based Intent Detection for Multiturn Dialogue
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial IntelligenceIntent detection aims to determine the intent of users, an important task in natural language processing and dialogue systems. As one of the key modules of task-based dialogue systems, intent detection directly influences the meaning analysis of spoken ...
Multitask learning for multilingual intent detection and slot filling in dialogue systems
AbstractDialogue systems are becoming an ubiquitous presence in our everyday lives having a huge impact on business and society. Spoken language understanding (SLU) is the critical component of every goal-oriented dialogue system or any ...
Highlights- We propose a multilingual multitask approach to fuse the two primary SLU tasks.
Training a Dialogue Act Tagger for human-human and human-computer travel dialogues
SIGDIAL '02: Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2While dialogue acts provide a useful schema for characterizing dialogue behaviors in human-computer and human-human dialogues, their utility is limited by the huge effort involved in hand-labelling dialogues with a dialogue act labelling scheme. In this ...
Comments