skip to main content
10.1145/3437802.3437822acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccrisConference Proceedingsconference-collections
research-article

Joint Slot Filling and Intent Detection in Spoken Language Understanding by Hybrid CNN-LSTM Model

Published: 04 January 2021 Publication History

Abstract

We investigate the usage of hybrid convolutional and long- short-term memory neural networks for joint slot filling and intent detection in spoken language understanding. We propose a novel model that combines between convolutional neural networks, for their ability to detect complex features in the input sequences by applying filters to frames of these inputs, and recurrent neural networks taking in account the fact, that they can keep track of the long- and short- term dependencies in the input sequences. We choose to build a model for joint slot filling and intent detection, because we believe, that there is a strong relation between the intent and the semantic slots. A joint model can reflect this relation, figure it out and make use of it to enhance the prediction results. We use the Airline Travel Information System (ATIS) dataset to measure the performance of our model and compare it with the results of other models, as this dataset has become one of the most popular datasets for spoken language understanding problem.

References

[1]
Jacob Benesty, Mohan M. Sondhi, Yiteng Huang, and Steven Greenberg. 2009. Springer Handbook of Speech Processing. The Journal of the Acoustical Society of America (2009). https://doi.org/10.1121/1.3203918
[2]
Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. 1996. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics(1996).
[3]
Slawomir Dadas, Jaroslaw Protasiewicz, and Witold Pedrycz. 2019. A deep learning model with data enrichment for intent detection and slot filling. In Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. https://doi.org/10.1109/SMC.2019.8914542
[4]
Fatima Zohra Daha and Saniika Hewavitharana. 2019. Deep Neural Architecture with Character Embedding for Semantic Frame Detection. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019. https://doi.org/10.1109/ICOSC.2019.8665582
[5]
Li Deng, Gokhan Tur, Xiaodong He, and Dilek Hakkani-Tur. 2012. Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In 2012 IEEE Workshop on Spoken Language Technology, SLT 2012 - Proceedings. https://doi.org/10.1109/SLT.2012.6424224
[6]
Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. A deep learning based multi-task ensemble model for intent detection and slot filling in spoken language understanding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-04212-7_57
[7]
Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Intent detection for spoken language understanding using a deep ensemble model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-319-97304-3_48
[8]
Daniel Zhaohan Guo, Gokhan Tur, Wen Tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings. https://doi.org/10.1109/SLT.2014.7078634
[9]
Raghav Gupta, Abhinav Rastogi, and Dilek Hakkani-Tür. 2018. An efficient approach to encoding context for spoken language understanding. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. https://doi.org/10.21437/Interspeech.2018-2403 arxiv:1807.00267
[10]
Patrick Haffner, Gokhan Tur, and Jerry H. Wright. 2003. Optimizing SVMs for complex call classification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/icassp.2003.1198860
[11]
Dilek Hakkani-Tür, Gokhan Tur, Asli Celikyilmaz, Yun Nung Chen, Jianfeng Gao, Li Deng, and Ye Yi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. https://doi.org/10.21437/Interspeech.2016-402
[12]
Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and Understanding Recurrent Networks. (jun 2015). arxiv:1506.02078http://arxiv.org/abs/1506.02078
[13]
Joo Kyung Kim, Gokhan Tur, Asli Celikyilmaz, Bin Cao, and Ye Yi Wang. 2017. Intent detection using semantically enriched word embeddings. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. https://doi.org/10.1109/SLT.2016.7846297
[14]
Changliang Li, Cunliang Kong, and Yan Zhao. 2018. A Joint Multi-Task Learning Framework for Spoken Language Understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2018.8461837
[15]
Changliang Li, Liang Li, and Ji Qi. 2020. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018. https://doi.org/10.18653/v1/d18-1417
[16]
Andrew McCallum, Dayne Freitag, and Fernando C N Pereira. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning.
[17]
Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, and Geoffrey Zweig. 2015. Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech and Language Processing (2015). https://doi.org/10.1109/TASLP.2014.2383614
[18]
Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding. https://doi.org/10.18653/v1/d19-1214 arxiv:1909.02188
[19]
Suman Ravuri and Andreas Stoicke. 2016. A comparative study of neural network models for lexical intent classification. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. https://doi.org/10.1109/ASRU.2015.7404818
[20]
Christian Raymond and Giuseppe Riccardi. 2007. Generative and discriminative algorithms for Spoken Language Understanding. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
[21]
Ruhi Sarikaya, Geoffrey E. Hinton, and Bhuvana Ramabhadran. 2011. Deep belief nets for natural language call-routing. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2011.5947649
[22]
Robert E. Schapire and Yoram Singer. 2000. BoosTexter: a boosting-based system for text categorization. Machine Learning (2000). https://doi.org/10.1023/A:1007649029923
[23]
Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d15-1036
[24]
Raphael Schumann and Pongtep Angkititrakul. 2018. Incorporating ASR Errors with Attention-Based, Jointly Trained RNN for Intent Detection and Slot Filling. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2018.8461598
[25]
Chen Tingting, Lin Min, and Li Yanling. 2019. Joint intention detection and semantic slot filling based on BLSTM and attention. In 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2019. https://doi.org/10.1109/ICCCBDA.2019.8725772
[26]
Gokhan Tur and Renato De Mori. 2011. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. https://doi.org/10.1002/9781119992691
[27]
Gokhan Tur, Dilek Hakkani-Tür, Larry Heck, and S. Parthasarathy. 2011. Sentence simplification for spoken language understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2011.5947636
[28]
Yu Wang, Yilin Shen, and Hongxia Jin. 2018. A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling. https://doi.org/10.18653/v1/n18-2050 arxiv:1812.10235
[29]
Ye Yi Wang and Alex Acero. 2006. Rapid development of spoken language understanding grammars. Speech Communication(2006). https://doi.org/10.1016/j.specom.2005.07.001
[30]
Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. https://doi.org/10.1109/ASRU.2013.6707709
[31]
Shuai Yu, Lei Shen, Pengcheng Zhu, and Jiansong Chen. 2018. ACJIS: A Novel Attentive Cross Approach for Joint Intent Detection and Slot Filling. In Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2018.8489304
[32]
Bassel Zaity, Hazem Wannous, Zein Shaheen, Igor Chernoruckiy, Pavel Drobintsev, and Vadim Pak. 2019. A hybrid convolutional and recurrent network approach for conversational AI in spoken language understanding. In CEUR Workshop Proceedings.
[33]
Shuyou Zhang, Junjie Jiang, Zaixing He, Xinyue Zhao, and Jinhui Fang. 2019. A Novel Slot-Gated Model Combined with a Key Verb Context Feature for Task Request Understanding by Service Robots. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2019.2931576
[34]
Su Zhu and Kai Yu. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2017.7953243 arxiv:1608.02097

Cited By

View all
  • (2023)Formation of environmental research competencies of foreign pre-masters’ students for sustainable region developmentE3S Web of Conferences10.1051/e3sconf/202345806018458(06018)Online publication date: 7-Dec-2023
  • (2023)ReCoMIFInformation Fusion10.1016/j.inffus.2023.03.01696:C(192-201)Online publication date: 1-Aug-2023
  • (2022)Intent Detection in Urdu Queries using Fine-tuned BERT models2022 16th International Conference on Open Source Systems and Technologies (ICOSST)10.1109/ICOSST57195.2022.10016834(1-6)Online publication date: 14-Dec-2022
  1. Joint Slot Filling and Intent Detection in Spoken Language Understanding by Hybrid CNN-LSTM Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CCRIS '20: Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System
    October 2020
    217 pages
    ISBN:9781450388054
    DOI:10.1145/3437802
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. Intent detection
    3. Joint model
    4. RNN
    5. Slot filling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CCRIS 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Formation of environmental research competencies of foreign pre-masters’ students for sustainable region developmentE3S Web of Conferences10.1051/e3sconf/202345806018458(06018)Online publication date: 7-Dec-2023
    • (2023)ReCoMIFInformation Fusion10.1016/j.inffus.2023.03.01696:C(192-201)Online publication date: 1-Aug-2023
    • (2022)Intent Detection in Urdu Queries using Fine-tuned BERT models2022 16th International Conference on Open Source Systems and Technologies (ICOSST)10.1109/ICOSST57195.2022.10016834(1-6)Online publication date: 14-Dec-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media