research-article

Joint Slot Filling and Intent Detection in Spoken Language Understanding by Hybrid CNN-LSTM Model

Authors:

Pavel Drobintsev,

Igor Chernoruckiy,

Andrey FilchenkovAuthors Info & Claims

CCRIS '20: Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System

Pages 112 - 117

https://doi.org/10.1145/3437802.3437822

Published: 04 January 2021 Publication History

Abstract

We investigate the usage of hybrid convolutional and long- short-term memory neural networks for joint slot filling and intent detection in spoken language understanding. We propose a novel model that combines between convolutional neural networks, for their ability to detect complex features in the input sequences by applying filters to frames of these inputs, and recurrent neural networks taking in account the fact, that they can keep track of the long- and short- term dependencies in the input sequences. We choose to build a model for joint slot filling and intent detection, because we believe, that there is a strong relation between the intent and the semantic slots. A joint model can reflect this relation, figure it out and make use of it to enhance the prediction results. We use the Airline Travel Information System (ATIS) dataset to measure the performance of our model and compare it with the results of other models, as this dataset has become one of the most popular datasets for spoken language understanding problem.

References

[1]

Jacob Benesty, Mohan M. Sondhi, Yiteng Huang, and Steven Greenberg. 2009. Springer Handbook of Speech Processing. The Journal of the Acoustical Society of America (2009). https://doi.org/10.1121/1.3203918

[2]

Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. 1996. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics(1996).

[3]

Slawomir Dadas, Jaroslaw Protasiewicz, and Witold Pedrycz. 2019. A deep learning model with data enrichment for intent detection and slot filling. In Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. https://doi.org/10.1109/SMC.2019.8914542

Digital Library

[4]

Fatima Zohra Daha and Saniika Hewavitharana. 2019. Deep Neural Architecture with Character Embedding for Semantic Frame Detection. In Proceedings - 13th IEEE International Conference on Semantic Computing, ICSC 2019. https://doi.org/10.1109/ICOSC.2019.8665582

[5]

Li Deng, Gokhan Tur, Xiaodong He, and Dilek Hakkani-Tur. 2012. Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In 2012 IEEE Workshop on Spoken Language Technology, SLT 2012 - Proceedings. https://doi.org/10.1109/SLT.2012.6424224

[6]

Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. A deep learning based multi-task ensemble model for intent detection and slot filling in spoken language understanding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-04212-7_57

[7]

Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Intent detection for spoken language understanding using a deep ensemble model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-319-97304-3_48

[8]

Daniel Zhaohan Guo, Gokhan Tur, Wen Tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings. https://doi.org/10.1109/SLT.2014.7078634

[9]

Raghav Gupta, Abhinav Rastogi, and Dilek Hakkani-Tür. 2018. An efficient approach to encoding context for spoken language understanding. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. https://doi.org/10.21437/Interspeech.2018-2403 arxiv:1807.00267

[10]

Patrick Haffner, Gokhan Tur, and Jerry H. Wright. 2003. Optimizing SVMs for complex call classification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/icassp.2003.1198860

[11]

Dilek Hakkani-Tür, Gokhan Tur, Asli Celikyilmaz, Yun Nung Chen, Jianfeng Gao, Li Deng, and Ye Yi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. https://doi.org/10.21437/Interspeech.2016-402

[12]

Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and Understanding Recurrent Networks. (jun 2015). arxiv:1506.02078http://arxiv.org/abs/1506.02078

[13]

Joo Kyung Kim, Gokhan Tur, Asli Celikyilmaz, Bin Cao, and Ye Yi Wang. 2017. Intent detection using semantically enriched word embeddings. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. https://doi.org/10.1109/SLT.2016.7846297

[14]

Changliang Li, Cunliang Kong, and Yan Zhao. 2018. A Joint Multi-Task Learning Framework for Spoken Language Understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2018.8461837

[15]

Changliang Li, Liang Li, and Ji Qi. 2020. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018. https://doi.org/10.18653/v1/d18-1417

[16]

Andrew McCallum, Dayne Freitag, and Fernando C N Pereira. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning.

[17]

Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, and Geoffrey Zweig. 2015. Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech and Language Processing (2015). https://doi.org/10.1109/TASLP.2014.2383614

[18]

Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding. https://doi.org/10.18653/v1/d19-1214 arxiv:1909.02188

[19]

Suman Ravuri and Andreas Stoicke. 2016. A comparative study of neural network models for lexical intent classification. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. https://doi.org/10.1109/ASRU.2015.7404818

[20]

Christian Raymond and Giuseppe Riccardi. 2007. Generative and discriminative algorithms for Spoken Language Understanding. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.

[21]

Ruhi Sarikaya, Geoffrey E. Hinton, and Bhuvana Ramabhadran. 2011. Deep belief nets for natural language call-routing. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2011.5947649

[22]

Robert E. Schapire and Yoram Singer. 2000. BoosTexter: a boosting-based system for text categorization. Machine Learning (2000). https://doi.org/10.1023/A:1007649029923

[23]

Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/d15-1036

[24]

Raphael Schumann and Pongtep Angkititrakul. 2018. Incorporating ASR Errors with Attention-Based, Jointly Trained RNN for Intent Detection and Slot Filling. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2018.8461598

[25]

Chen Tingting, Lin Min, and Li Yanling. 2019. Joint intention detection and semantic slot filling based on BLSTM and attention. In 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2019. https://doi.org/10.1109/ICCCBDA.2019.8725772

[26]

Gokhan Tur and Renato De Mori. 2011. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. https://doi.org/10.1002/9781119992691

[27]

Gokhan Tur, Dilek Hakkani-Tür, Larry Heck, and S. Parthasarathy. 2011. Sentence simplification for spoken language understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2011.5947636

[28]

Yu Wang, Yilin Shen, and Hongxia Jin. 2018. A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling. https://doi.org/10.18653/v1/n18-2050 arxiv:1812.10235

[29]

Ye Yi Wang and Alex Acero. 2006. Rapid development of spoken language understanding grammars. Speech Communication(2006). https://doi.org/10.1016/j.specom.2005.07.001

[30]

Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. https://doi.org/10.1109/ASRU.2013.6707709

[31]

Shuai Yu, Lei Shen, Pengcheng Zhu, and Jiansong Chen. 2018. ACJIS: A Novel Attentive Cross Approach for Joint Intent Detection and Slot Filling. In Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2018.8489304

[32]

Bassel Zaity, Hazem Wannous, Zein Shaheen, Igor Chernoruckiy, Pavel Drobintsev, and Vadim Pak. 2019. A hybrid convolutional and recurrent network approach for conversational AI in spoken language understanding. In CEUR Workshop Proceedings.

[33]

Shuyou Zhang, Junjie Jiang, Zaixing He, Xinyue Zhao, and Jinhui Fang. 2019. A Novel Slot-Gated Model Combined with a Key Verb Context Feature for Task Request Understanding by Service Robots. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2019.2931576

[34]

Su Zhu and Kai Yu. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. https://doi.org/10.1109/ICASSP.2017.7953243 arxiv:1608.02097

Cited By

Krasnoshchekov VRud’ VDavydov RSemenova NDiuldin MKharlamova NIonkina EShimkovich E(2023)Formation of environmental research competencies of foreign pre-masters’ students for sustainable region developmentE3S Web of Conferences10.1051/e3sconf/202345806018458(06018)Online publication date: 7-Dec-2023
https://doi.org/10.1051/e3sconf/202345806018
Xie BJia XSong XZhang HChen BJiang BWang YPan Y(2023)ReCoMIFInformation Fusion10.1016/j.inffus.2023.03.01696:C(192-201)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.inffus.2023.03.016
Shams SSadia BAslam M(2022)Intent Detection in Urdu Queries using Fine-tuned BERT models2022 16th International Conference on Open Source Systems and Technologies (ICOSST)10.1109/ICOSST57195.2022.10016834(1-6)Online publication date: 14-Dec-2022
https://doi.org/10.1109/ICOSST57195.2022.10016834

Joint Slot Filling and Intent Detection in Spoken Language Understanding by Hybrid CNN-LSTM Model
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Joint Intent Detection Model for Task-oriented Human-Computer Dialogue System using Asynchronous Training
How to accurately understand low-resource languages is the core of the task-oriented human-computer dialogue system. Language understanding consists of two sub-tasks, i.e., intent detection and slot filling. Intent detection still faces challenges due to ...
Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot Filling
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Abstract
Dialogue intent detection and semantic slot filling are two critical tasks in nature language understanding (NLU) for task-oriented dialog systems. In this paper, we present an attention-based encoder-decoder neural network model for joint intent ...
A Deep Learning Based Multi-task Ensemble Model for Intent Detection and Slot Filling in Spoken Language Understanding
Neural Information Processing
Abstract
An important component of every dialog system is understanding the language popularly known as Spoken Language Understanding (SLU). Intent detection (ID) and slot filling (SF) are the two very important and inter-related tasks of SLU. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CCRIS '20: Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System

October 2020

217 pages

ISBN:9781450388054

DOI:10.1145/3437802

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CCRIS 2020

CCRIS 2020: 2020 International Conference on Control, Robotics and Intelligent System

October 27 - 29, 2020

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Krasnoshchekov VRud’ VDavydov RSemenova NDiuldin MKharlamova NIonkina EShimkovich E(2023)Formation of environmental research competencies of foreign pre-masters’ students for sustainable region developmentE3S Web of Conferences10.1051/e3sconf/202345806018458(06018)Online publication date: 7-Dec-2023
https://doi.org/10.1051/e3sconf/202345806018
Xie BJia XSong XZhang HChen BJiang BWang YPan Y(2023)ReCoMIFInformation Fusion10.1016/j.inffus.2023.03.01696:C(192-201)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.inffus.2023.03.016
Shams SSadia BAslam M(2022)Intent Detection in Urdu Queries using Fine-tuned BERT models2022 16th International Conference on Open Source Systems and Technologies (ICOSST)10.1109/ICOSST57195.2022.10016834(1-6)Online publication date: 14-Dec-2022
https://doi.org/10.1109/ICOSST57195.2022.10016834

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents