Elsevier

Neurocomputing

Volume 445, 20 July 2021, Pages 267-275
Neurocomputing

From context-aware to knowledge-aware: Boosting OOV tokens recognition in slot tagging with background knowledge

https://doi.org/10.1016/j.neucom.2021.01.134Get rights and content

Abstract

Neural-based context-aware models for slot tagging tasks in language understanding have achieved state-of-the-art performance, especially deep contextualized models, such as ELMo, BERT. However, the presence of out-of-vocab (OOV) words significantly degrades the performance of neural-based models, especially in a few-shot scenario. In this paper, we propose a novel knowledge-aware slot tagging model to integrate contextual representation of input text and the large-scale lexical background knowledge. Besides, we use multi-level graph attention to explicitly reason via lexical relations. We aim to leverage both linguistic regularities covered by deep language models (LM) and high-quality background knowledge derived from curated knowledge bases (KB). Consequently, our model could infer rare and unseen words in the test dataset by incorporating contextual semantics learned from the training dataset and lexical relations from ontology. The experiments show that our proposed knowledge integration mechanism achieves consistent improvements across settings with different sizes of training data on two public benchmark datasets. We also show through detailed analysis that incorporating background knowledge effectively alleviates issues of data scarcity.

Section snippets

2021

Slot tagging is a critical component of spoken language understanding (SLU) in dialogue systems. It aims at parsing semantic concepts from user utterances. For instance, given the utterance Elsevier B.V. from the ATIS [11] dataset, a slot tagging model should identify lunch as a meal_description slot type. Given sufficient training data, recent neural-based models [19], [15], [16], [8], [10] have achieved remarkably good results.

However, most of the prior works merely focus on how to capture

Slot tagging

Slot Tagging can be treated as a sequence labeling task, and the early approach is conditional random fields (CRF) [24]. Neural-based models have achieved good performances with the availability of adequate training data. [19] proposes several RNN-based models for slot tagging and outperforms CRF-based models. [17] proposes an LSTM-CNN-CRF model, which induces character representations using a convolutional neural network. [8], [10], [25] employ joint learning of slot tagging and intent

Methodology

In this work, we consider the slot tagging task in the few-shot scenario, especially for OOV tokens. Given a sequence with n tokens X={xi}i=1n, our goal is to predict a corresponding tagging sequence Y={yi}i=1n. This section first explains our BERT-based model and then introduces the proposed knowledge integration mechanism for inducing background knowledge. The overall model architecture is illustrated in Fig. 2.

Datasets

To evaluate our approach, we conduct experiments on two public benchmark datasets, ATIS [11] and Snips [4]. ATIS contains 4,478 utterances in the training set and 893 utterances in the test set, while Snips contains 13,084 and 700 utterances, respectively. The percentage of OOV words between the training and test datasets is 0.77% (ATIS) and 5.95% (Snips). Samples in Snips are from different topics (see Table 1), such as getting weather and booking a restaurant, resulting in a larger

Overall results

We display the experiment results in Table 3. We validate the performance improvement with statistical significance tests for all experiments, where the single-tailed t-test is performed to measure whether the results from the proposed model are significantly better than ones from baselines. The t-test indicates that the improvement is significant with p<0.05. Table 3 shows that our proposed knowledge integration mechanism significantly outperforms the baselines for both datasets, demonstrating

Conclusion

We present a novel knowledge integration mechanism of incorporating background KB and deep contextual representations to facilitate the few-shot slot tagging task. Experiments confirm the effectiveness of modeling explicit lexical relations which has not yet been explored by previous works. Moreover, we find our method delivers more benefits to data scarcity scenarios. We hope to provide new guidance for the future slot tagging work.

CRediT authorship contribution statement

Keqing He: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing. Yuanmeng Yan: Data curation, Visualization, Investigation. Weiran Xu: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by National Key R&D Program of China No. 2019YFF0303300 and Subject II No. 2019YFF0303302, MoE-CMCC "Artifical Intelligence" Project No. MCM20190701.

Keqing He received the B.S. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include deep learning, natural language understanding, and dialogue system.

References (33)

  • A. Bordes et al.

    Learning structured embeddings of knowledge bases

  • Q. Chen, X. Zhu, Z.H. Ling, D. Inkpen, S. Wei, Neural natural language inference models enhanced with external...
  • Q. Chen, Z. Zhuo, W. Wang, Bert for joint intent classification and slot filling, 2019. arXiv preprint...
  • A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril,...
  • R. Das, T. Munkhdalai, X. Yuan, A. Trischler, A. McCallum, Building dynamic knowledge graphs from text using machine...
  • J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language...
  • F. Dugas, E. Nichols, Deepnnner: applying blstm-cnns and extended lexicons to named entity recognition in tweets, in:...
  • C.W. Goo et al.

    Slot-gated modeling for joint slot filling and intent prediction

  • K. Guu, J. Miller, P. Liang, Traversing knowledge graphs in vector space, 2015. arXiv preprint...
  • E. Haihong et al.

    A novel bi-directional interrelated model for joint intent detection and slot filling

  • C.T. Hemphill, J.J. Godfrey, G.R. Doddington, The atis spoken language systems pilot corpus, in: Speech and Natural...
  • D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. CoRR abs/1412.6980...
  • S. Lee, R. Jha, Zero-shot adaptive transfer for conversational language understanding, in: AAAI,...
  • B.Y. Lin, X. Chen, J. Chen, X. Ren, Kagnet: knowledge-aware graph networks for commonsense reasoning, 2019. arXiv...
  • B. Liu, I. Lane, Recurrent neural network structured output prediction for spoken language understanding, in: Proc....
  • B. Liu, I. Lane, Attention-based recurrent neural network models for joint intent detection and slot filling, 2016....
  • Cited by (6)

    Keqing He received the B.S. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include deep learning, natural language understanding, and dialogue system.

    Weiran Xu was born in Beijing, China in 1975. He received the B.S. and M.S. degrees in information engineering from the Dalian University of Technology, Dalian, in 2000 and the Ph.D. degree in information engineering from Beijing University of Posts and Telecommunications, Beijing, in 2003. From 2003 to 2006, he was a lecturer with the Department of Information and Communication, Beijing University of Posts and Telecommunications. Since 2006, he has been an Associate Professor with the Department of Information and Communication, Beijing University of Posts and Telecommunications. He is the author of more than 50 articles. His research interests include machine learning, natural language understanding, dialogue system, web searching, and representation learning. Prof. Xu participated in many projects of the National Natural Science Foundation of China (NSFC), and he is also a Senior Member of the China Computer Federation.

    Yuanmeng Yan received the B.S. degree in software engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2018. He is currently pursuing an M.S. degree in information engineering at Beijing University of Posts and Telecommunications. His research interests include machine learning, information retrieval, and dialogue system.

    View full text