Abstract
Intent detection and slot filling are core tasks in natural language understanding (NLU) for task-oriented dialogue systems. However, current models face challenges with numerous intent categories, slot types, and domain classifications, alongside a shortage of well-annotated datasets, particularly in Chinese. Therefore, we propose a domain-aware model with multi-perspective, multi-positive contrastive learning. First, we adopt a self-supervised contrastive learning with multiple perspectives and multiple positive instances, which is capable of spacing the vectors of positive and negative instances from the domain, intent, and slot perspectives, and fusing more positive instance information to increase the classification effectiveness of the model. Our proposed domain-aware model defines domain-level units at the decoding layer, allowing the model to predict intent and slot information based on domain features, which greatly reduces the search space for intent and slot. In addition, we design a dual-stage attention mechanism for capturing implicitly shared information between intents and slots. We propose a data augmentation method that adds noise to the embedding layer, applies fine-grained augmentation techniques, and filters biased samples based on a similarity threshold. Our model is applied to real task-oriented dialogue systems and compared with other NLU models. Experimental results demonstrate that our proposed model outperforms other models in terms of NLU performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
Data available on request from the authors. The data that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv:1902.10909
Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech 2016
Zhang X, Wang H (2016) A joint model of intent determination and slot filling for spoken language understanding. In: IJCAI, vol 16, pp 2993–2999
Cao X, Xiong D, Shi C, Wang C, Meng Y, Hu C (2020) Balanced joint adversarial training for robust intent detection and slot filling. In: Proceedings of the 28th international conference on computational linguistics, pp 4926–4936
Goo C-W, Gao G, Hsu Y-K, Huo C-L, Chen T-C, Hsu K-W, Chen Y-N (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp. 753–757
Li C, Li L, Qi J (2018) A self-attentive model with gate mechanism for spoken language understanding. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3824–3833
Qin L, Che W, Li Y, Wen H, Liu T (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proceedings of the 2019 Conference on Empirical Methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2078–2087
Qin L, Liu T, Che W, Kang B, Zhao S, Liu T (2021) A co-interactive transformer for joint slot filling and intent detection. In: ICASSP 2021-2021 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 8193–8197
Yang P, Ji D, Ai C, Li B (2021) Aise: Attending to intent and slots explicitly for better spoken language understanding. Knowl-Based Syst 211:106537
Hao X, Wang L, Zhu H, Guo X (2023) Joint agricultural intent detection and slot filling based on enhanced heterogeneous attention mechanism. Comput Electron Agric 207:107756
Tur G, Hakkani-Tür D, Heck L (2010) What is left to be understood in Atis? In: 2010 IEEE Spoken language technology workshop, pp 19–24. IEEE
Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190
Haffner P, Tur G, Wright JH (2003) Optimizing svms for complex call classification. In: 2003 IEEE International conference on acoustics, speech, and signal processing, 2003. Proceedings.(ICASSP’03)., vol. 1, p. IEEE
Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Mach Learn 39:135–168
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol. 2, Makuhari, pp 1045–1048
Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with em routing. In: International conference on learning representations
Xia C, Zhang C, Yan X, Chang Y, Philip SY (2018) Zero-shot user intent detection via capsule neural networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3090–3099
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Wang C, Liu X, Chen Z, Hong H, Tang J, Song D (2022) Deepstruct: Pretraining of language models for structure prediction. In: Findings of the association for computational linguistics: ACL 2022, pp 803–823
Hakkani-Tür D, Tür G, Celikyilmaz A, Chen Y-N, Gao J, Deng L, Wang Y-Y (2016) Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In: Interspeech, pp 715–719
Phuong NM, Le T, Minh NL (2022) Cae: mechanism to diminish the class imbalanced in slu slot filling task. In: International conference on computational collective intelligence. Springer, pp 150–163
Zhang L, Ma D, Zhang X, Yan X, Wang H (2020) Graph lstm with context-gated mechanism for spoken language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 9539–9546
Dao MH, Truong TH, Nguyen DQ (2021) Intent detection and slot filling for Vietnamese. arXiv:2104.02021
Wu D, Ding L, Lu F, Xie J (2020) Slotrefine: A fast non-autoregressive model for joint intent detection and slot filling. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 1932–1937
Haihong E, Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th Annual meeting of the association for computational linguistics, pp 5467–5471
Rafiepour M, Sartakhti JS (2023) Ctran: Cnn-transformer-based network for natural language understanding. Eng Appl Artif Intell 126:107013
Tu NA, Hieu DX, Phuong TM, Bach NX (2023) A bidirectional joint model for spoken language understanding. In: ICASSP 2023-2023 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1–5
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607
Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M et al (2020) Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst 33:21271–21284
Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 15750–15758
Tian Y, Fan L, Isola P, Chang H, Krishnan D (2024) Stablerep: Synthetic images from text-to-image models make strong visual representation learners. Adv Neural Inf Process Syst 36
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748–8763
Yan Y, Li R, Wang S, Zhang F, Wu W, Xu WC (2021) A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, vol. 1
Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: 2021 Conference on empirical methods in natural language processing, EMNLP 2021, pp 6894–6910. Association for Computational Linguistics (ACL)
Jain N, Chiang P-Y, Wen Y, Kirchenbauer J, Chu H-M, Somepalli G, Bartoldson BR, Kailkhura B, Schwarzschild A, Saha A et al (2023) Neftune: Noisy embeddings improve instruction finetuning. In: The Twelfth international conference on learning representations
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W-T, Rocktäschel T et al (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inf Process Syst 33:9459–9474
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (12273003).
Author information
Authors and Affiliations
Contributions
Di Wang: Methodology, Validation, Formal analysis, Writing - original draft, Visualization. Qingjian Ni: Writing - review & editing.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no conflict of interest.
Ethical and informed consent for data used
This paper hereby states that all data used in this study were obtained in an ethical manner and in accordance with informed consent protocols. The dataset used in this study was obtained from NIO. The use of this dataset for academic or research purposes does not violate any copyright, intellectual property or data protection legislation.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, D., Ni, Q. A domain-aware model with multi-perspective contrastive learning for natural language understanding. Appl Intell 55, 218 (2025). https://doi.org/10.1007/s10489-024-06154-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06154-x