From context-aware to knowledge-aware: Boosting OOV tokens recognition in slot tagging with background knowledge
Section snippets
2021
Slot tagging is a critical component of spoken language understanding (SLU) in dialogue systems. It aims at parsing semantic concepts from user utterances. For instance, given the utterance Elsevier B.V. from the ATIS [11] dataset, a slot tagging model should identify lunch as a meal_description slot type. Given sufficient training data, recent neural-based models [19], [15], [16], [8], [10] have achieved remarkably good results.
However, most of the prior works merely focus on how to capture
Slot tagging
Slot Tagging can be treated as a sequence labeling task, and the early approach is conditional random fields (CRF) [24]. Neural-based models have achieved good performances with the availability of adequate training data. [19] proposes several RNN-based models for slot tagging and outperforms CRF-based models. [17] proposes an LSTM-CNN-CRF model, which induces character representations using a convolutional neural network. [8], [10], [25] employ joint learning of slot tagging and intent
Methodology
In this work, we consider the slot tagging task in the few-shot scenario, especially for OOV tokens. Given a sequence with n tokens , our goal is to predict a corresponding tagging sequence . This section first explains our BERT-based model and then introduces the proposed knowledge integration mechanism for inducing background knowledge. The overall model architecture is illustrated in Fig. 2.
Datasets
To evaluate our approach, we conduct experiments on two public benchmark datasets, ATIS [11] and Snips [4]. ATIS contains 4,478 utterances in the training set and 893 utterances in the test set, while Snips contains 13,084 and 700 utterances, respectively. The percentage of OOV words between the training and test datasets is 0.77% (ATIS) and 5.95% (Snips). Samples in Snips are from different topics (see Table 1), such as getting weather and booking a restaurant, resulting in a larger
Overall results
We display the experiment results in Table 3. We validate the performance improvement with statistical significance tests for all experiments, where the single-tailed t-test is performed to measure whether the results from the proposed model are significantly better than ones from baselines. The t-test indicates that the improvement is significant with . Table 3 shows that our proposed knowledge integration mechanism significantly outperforms the baselines for both datasets, demonstrating
Conclusion
We present a novel knowledge integration mechanism of incorporating background KB and deep contextual representations to facilitate the few-shot slot tagging task. Experiments confirm the effectiveness of modeling explicit lexical relations which has not yet been explored by previous works. Moreover, we find our method delivers more benefits to data scarcity scenarios. We hope to provide new guidance for the future slot tagging work.
CRediT authorship contribution statement
Keqing He: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing. Yuanmeng Yan: Data curation, Visualization, Investigation. Weiran Xu: Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was partially supported by National Key R&D Program of China No. 2019YFF0303300 and Subject II No. 2019YFF0303302, MoE-CMCC "Artifical Intelligence" Project No. MCM20190701.
Keqing He received the B.S. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include deep learning, natural language understanding, and dialogue system.
References (33)
- et al.
Learning structured embeddings of knowledge bases
- Q. Chen, X. Zhu, Z.H. Ling, D. Inkpen, S. Wei, Neural natural language inference models enhanced with external...
- Q. Chen, Z. Zhuo, W. Wang, Bert for joint intent classification and slot filling, 2019. arXiv preprint...
- A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril,...
- R. Das, T. Munkhdalai, X. Yuan, A. Trischler, A. McCallum, Building dynamic knowledge graphs from text using machine...
- J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language...
- F. Dugas, E. Nichols, Deepnnner: applying blstm-cnns and extended lexicons to named entity recognition in tweets, in:...
- et al.
Slot-gated modeling for joint slot filling and intent prediction
- K. Guu, J. Miller, P. Liang, Traversing knowledge graphs in vector space, 2015. arXiv preprint...
- et al.
A novel bi-directional interrelated model for joint intent detection and slot filling
Cited by (6)
A code-mixed task-oriented dialog dataset for medical domain
2023, Computer Speech and LanguageCitation Excerpt :Such as Basu et al. (2022b), Zhang et al. (2022b) and Bai et al. (2022) used contrastive learning with pre-trained BERT LMs for developing NLU models. He et al. (2021) employed commonsense knowledge from the wordnet to improve the performance of slot tagging. For dialog state tracking, the research progressed towards the explicit slot and utterance alignments, schema graphs, and BERT-based encoder–decoder architectures (Wang et al., 2022; Feng et al., 2022b, 2021).
The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
2023, Proceedings of the Annual Meeting of the Association for Computational LinguisticsCAE: Mechanism to Diminish the Class Imbalanced in SLU Slot Filling Task
2022, Communications in Computer and Information ScienceCombined Coverage, Attention and Pointer Networks for Improving Slot Filling in Spoken Language Understanding
2021, Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021
Keqing He received the B.S. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include deep learning, natural language understanding, and dialogue system.
Weiran Xu was born in Beijing, China in 1975. He received the B.S. and M.S. degrees in information engineering from the Dalian University of Technology, Dalian, in 2000 and the Ph.D. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, in 2003. From 2003 to 2006, he was a lecturer with the Department of Information and Communication, Beijing University of Posts and Telecommunications. Since 2006, he has been an Associate Professor with the Department of Information and Communication, Beijing University of Posts and Telecommunications. He is the author of more than 50 articles. His research interests include machine learning, natural language understanding, dialogue system, web searching, and representation learning. Prof. Xu participated in many projects of the National Natural Science Foundation of China (NSFC), and he is also a Senior Member of the China Computer Federation.
Yuanmeng Yan received the B.S. degree in software engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include machine learning, information retrieval, and dialogue system.