Joint intent detection and slot filling using weighted finite state transducer and BERT

Abro, Waheed Ahmed; Qi, Guilin; Aamir, Muhammad; Ali, Zafar

doi:10.1007/s10489-022-03295-9

Joint intent detection and slot filling using weighted finite state transducer and BERT

Published: 01 April 2022

Volume 52, pages 17356–17370, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Waheed Ahmed Abro ORCID: orcid.org/0000-0001-5878-3448^1,2,
Guilin Qi¹,
Muhammad Aamir³ &
…
Zafar Ali¹

914 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Intent detection and slot filling are the two most essential tasks of natural language understanding (NLU). Deep neural models have produced impressive results on these tasks. However, the predictive accuracy of these models heavily depends upon a massive amount of supervised data. In many applications collecting high-quality labeled data is a very expensive and time taking process. This paper proposes WFST-BERT model which augments the fine-tuning of BERT-like architecture with weighted finite-state transducer (WFST) to reduce the need for massive supervised data. The WFST-BERT employs regular expressions (REs) rules to encode domain knowledge and pre-trained BERT model to generate contextual representations of user sentences. In particular, the model converts REs into the trainable weighted finite-state transducer, which can generate decent predictions when limited or no training examples are available. Moreover, BERT contextual representation is combined with WFST and trained simultaneously on supervised data using a gradient descent algorithm. The experimental results on the ATIS dataset show that the F1-Score of the WFST-BERT improved by around 1.8% and 1.3% for intent detection and 0.9%, 0.7% for slot filling tasks as compared to its counterparts RE-NN and JointBERT models in limited data settings. Further, in full data settings, the proposed model generates better recall and F1-score than state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context Aware Joint Modeling of Domain Classification, Intent Detection and Slot Filling with Zero-Shot Intent Detection Approach

Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information

An Interactive Two-Pass Decoding Network for Joint Intent Detection and Slot Filling

Notes

References

Tur G, De Mori R (2011) Spoken language understanding: Systems for extracting semantic information from speech. Wiley
Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. In: Proceedings of the International Speech Communication Association (INTERSPEECH 2016), pp 685–689
Goo C-W, Gao G, Hsu Y-K, Huo C-L, Chen T-C, Hsu K-W, Chen Y-N (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 753–757
E H, Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5467–5471
Obuchowski A, Lew M (2020) Transformer-capsule model for intent detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 13885–13886
Casanueva I , Temčinas T, Gerz D, Henderson M, Vulić I (2020) Efficient intent detection with dual sentence encoders. In: Proceedings of the 2nd workshop on natural language processing for conversational AI, pp 38–45
Wen T-H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P-H, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics: volume 1, long papers, pp 438–449
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 4171–4186
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
Arase Y, Tsujii J (2019) Transfer fine-tuning: A BERT case study. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5393–5404
Chang A X, Manning C D (2014) Tokensregex: Defining cascaded regular expressions over tokens. Tech. Rep. CSTR 2014-02
Zhang S, He L, Vucetic S, Dragut E (2018) Regular expression guided entity mention mining from noisy web data. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 1991–2000
Li Y, Krishnamurthy R, Raghavan S, Vaithyanathan S, Jagadish H V (2008) Regular expression learning for information extraction. In: Proceedings of the 2008 conference on empirical methods in natural language processing, pp 21–30
Hu Z, Ma X, Liu Z, Hovy E, Xing E (2016) Harnessing deep neural networks with logic rules. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2410–2420
Li X L, Rush A (2020) Posterior control of blackbox generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 2731–2743
Alashkar T, Jiang S, Wang S, Fu Y (2017) Examples-rules guided deep neural network for makeup recommendation. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 941–947
Awasthi A, Ghosh S, Goyal R, Sarawagi S (2020) Learning from rules generalizing labeled exemplars. In: Proceedings of the international conference on learning representations
Xu J, Zhang Z, Friedman T, Liang Y, Van den Broeck G (2018) A semantic loss function for deep learning with symbolic knowledge. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 5502–5511
Luo B, Feng Y, Wang Z, Huang S, Yan R, Zhao D (2018) Marrying up regular expressions with neural networks: a case study for spoken language understanding. In: proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2083–2093
Jiang C, Zhao Y, Chu S, Shen L, Tu K (2020) Cold-start and interpretability: Turning regular expressions into trainable recurrent neural networks. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 3193–3207
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Ravuri S, Stolcke A (2016) A comparative study of recurrent neural network models for lexical domain classification. In: Proceedings of the international conference on acoustics, speech, and signal processing, pp 6075–6079
Abro W A, Qi G, Gao H, Khan M A, Ali Z (2019) Multi-turn intent determination for goal-oriented dialogue systems. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), pp 1–8
Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D et al (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530– 539
Article Google Scholar
Hakkani-Tür D, Tür G, Celikyilmaz A, Chen Y-N, Gao J, Deng L, Wang Y-Y (2016) Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In: Proceedings of the International Speech Communication Association (INTERSPEECH 2016), pp 715–719
Zhang C, Li Y, Du N, Fan W, Yu P (2019) Joint slot filling and intent detection via capsule neural networks. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5259–5267
Aamir M, Rahman Z, Abro W A, Tahir M, Ahmed S M (2019) An optimized architecture of image classification using convolutional neural network. Int J Image Graph Signal Process 10(10):30
Article Google Scholar
Xia C, Zhang C, Yan X, Chang Y, Yu P (2018) Zero-shot user intent detection via capsule neural networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3090–3099
Abro W A, Aicher A, Rach N, Ultes S, Minker W, Qi G (2022) Natural language understanding for argumentative dialogue systems in the opinion building domain. Knowl-Based Syst 242:108318
Article Google Scholar
Henderson M, Casanueva I , Mrkšić N, Su P-H, Wen T-H, Vulić I (2020) ConveRT: Efficient and accurate conversational representations from transformers. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp 2161–2174
Firdaus M, Kumar A, Ekbal A, Bhattacharyya P (2019) A multi-task hierarchical approach for intent detection and slot filling. Knowl-Based Syst 183:104846. https://doi.org/10.1016/j.knosys.2019.07.017
Article Google Scholar
Chen Q, Zhuo Z, Wang W (2019) Bert for joint intent classification and slot filling. arXiv:1902.10909
Bunk T, Varshneya D, Vlasov V, Nichol A (2020) DIET: lightweight language understanding for dialogue systems. arXiv:2004.09936
Cer D, Yang Y, Kong S-, Hua N, Limtiaco N, St. John R, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Strope B, Kurzweil R (2018) Universal sentence encoder for English. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pp 169–174
Henderson M, Casanueva I , Mrkšić N, Su P-H, Wen T-H, Vulić I (2020) ConveRT: Efficient and accurate conversational representations from transformers. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2161– 2174
Guarasci R, Silvestri S, De Pietro G, Fujita H, Esposito M (2022) Bert syntactic transfer: a computational experiment on italian, french and english languages. Comput Speech Lang 71:101261. https://doi.org/10.1016/j.csl.2021.101261
Article Google Scholar
Guarasci R, Silvestri S, Pietro G D, Fujita H, Esposito M (2021) Assessing bert’s ability to learn italian syntax: a study on null-subject and agreement phenomena. J Ambient Intell Humani Comput:1–15
Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Inf Sci 514:88–105. https://doi.org/10.1016/j.ins.2019.12.002
Article Google Scholar
Pota M, Ventura M, Fujita H, Esposito M (2021) Multilingual evaluation of pre-processing for bert-based sentiment analysis of tweets. Expert Syst Appl 181:115119. https://doi.org/10.1016/j.eswa.2021.115119
Article Google Scholar
Li T, Srikumar V (2019) Augmenting neural networks with first-order logic. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 292–302
Ali Z, Qi G, Muhammad K, Ali B, Abro W A (2020) Paper recommendation based on heterogeneous network embedding. Knowl-Based Syst 210:106438
Article Google Scholar
Ali Z, Qi G, Kefalas P, Abro W A, Ali B (2020) A graph-based taxonomy of citation recommendation models. Artif Intell Rev 53(7)
Waqas M, Khan Z, Anjum S, Tahir M A (2020) Lung-wise tuberculosis analysis and automatic ct report generation with hybrid feature and ensemble learning.. In: CLEF (Working Notes)
Abro W A, Qi G, Ali Z, Feng Y, Aamir M (2020) Multi-turn intent determination and slot filling with neural networks and regular expressions. Knowl-Based Syst 208:106428
Article Google Scholar
Locascio N, Narasimhan K, DeLeon E, Kushman N, Barzilay R (2016) Neural generation of regular expressions from natural language with minimal domain knowledge. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1918–1923
Thompson K (1968) Programming techniques: Regular expression search algorithm. Commun ACM 11(6):419–422. https://doi.org/10.1145/363347.363387
Article MATH Google Scholar
Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269
Article MATH Google Scholar
Schwartz R, Thomson S, Smith N A (2018) Bridging CNNs, RNNs, and weighted finite-state machines. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pp 295–305
Rabin M O, Scott D (1959) Finite automata and their decision problems. IBM J Res Dev 3 (2):114–125
Article MathSciNet MATH Google Scholar
Hopcroft J (1971) An n log n algorithm for minimizing states in a finite automaton. In: Theory of machines and computations. Elsevier, pp 189–196
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the Advances in neural information processing systems, pp 5998–6008
Wu Y, Schuster M, Chen Z, Le Q V, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144
Hemphill C T, Godfrey J J, Doddington G R (1990) The atis spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990, pp 24–27
Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190
Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Lang Process 23(3):530–539. https://doi.org/10.1109/TASLP.2014.2383614
Article Google Scholar
Friedl JEF (2006) Mastering regular expressions. O’Reilly Media, Inc.
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Qin L, Che W, Li Y, Wen H, Liu T (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 2078–2087
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Waheed Ahmed Abro, Guilin Qi & Zafar Ali
National University of Computer and Emerging Sciences, Karachi Campus, Pakistan
Waheed Ahmed Abro
Department of Computer Science, Huanggang Normal University, Huangzhou, Hubei, China
Muhammad Aamir

Authors

Waheed Ahmed Abro
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Aamir
View author publications
You can also search for this author in PubMed Google Scholar
Zafar Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Waheed Ahmed Abro.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abro, W.A., Qi, G., Aamir, M. et al. Joint intent detection and slot filling using weighted finite state transducer and BERT. Appl Intell 52, 17356–17370 (2022). https://doi.org/10.1007/s10489-022-03295-9

Download citation

Accepted: 23 January 2022
Published: 01 April 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10489-022-03295-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint intent detection and slot filling using weighted finite state transducer and BERT

Abstract

Access this article

Similar content being viewed by others

Context Aware Joint Modeling of Domain Classification, Intent Detection and Slot Filling with Zero-Shot Intent Detection Approach

Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information

An Interactive Two-Pass Decoding Network for Joint Intent Detection and Slot Filling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint intent detection and slot filling using weighted finite state transducer and BERT

Abstract

Access this article

Similar content being viewed by others

Context Aware Joint Modeling of Domain Classification, Intent Detection and Slot Filling with Zero-Shot Intent Detection Approach

Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information

An Interactive Two-Pass Decoding Network for Joint Intent Detection and Slot Filling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation