Abstract
One of the significant task in spoken language understanding (SLU) is intent detection. In this paper, we propose a deep learning based ensemble model for intent detection. The outputs of different deep learning architectures such as convolutional neural network (CNN) and variants of recurrent neural networks (RNN) like long short term memory (LSTM) and gated recurrent units (GRU) are combined together using a multi-layer perceptron (MLP). The classifiers are trained using a combined word embedding representation obtained from both Word2Vec and Glove. Our experiments on the benchmark ATIS dataset show state-of-the-art performance for intent detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Ekbal, A., Saha, S.: Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach. ACM Trans. Asian Lang. Inf. Process. 10(2), 9:1–9:37 (2011)
Gorin, A.L., Riccardi, G., Wright, J.H.: How may I help you? Speech Commun. 23(1–2), 113–127 (1997)
Guo, D., Tur, G., Yih, W.t., Zweig, G.: Joint semantic utterance classification and slot filling with recursive neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 554–559. IEEE (2014)
Haffner, P., Tur, G., Wright, J.H.: Optimizing SVMs for complex call classification. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2003), vol. 1, p. I. IEEE (2003)
Hakkani-Tür, D., Tur, G., Chotimongkol, A.: Using syntactic and semantic graphs for call classification. In: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing (2005)
Hakkani-Tür, D., Riccardi, G., Tur, G.: An active approach to spoken language processing. ACM Trans. Speech Lang. Process. (TSLP) 3(3), 1–31 (2006)
Hakkani-Tür, D., et al.: Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In: INTERSPEECH, pp. 715–719 (2016)
Hashemi, H.B., Asiaee, A., Kraft, R.: Query intent detection using convolutional neural networks. In: International Conference on Web Search and Data Mining, Workshop on Query Understanding (2016)
He, Y., Young, S.: A data-driven spoken language understanding system. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, pp. 583–588. IEEE (2003)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jeong, M., Lee, G.G.: Triangular-chain conditional random fields. IEEE Trans. Audio Speech Lang. Process. 16(7), 1287–1302 (2008)
Karpathy, A., Johnson, J., Fei-Fei, L.: Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078 (2015)
Kim, J.K., Tur, G., Celikyilmaz, A., Cao, B., Wang, Y.Y.: Intent detection using semantically enriched word embeddings. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 414–419. IEEE (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)
Liu, B., Lane, I.: Joint online spoken language understanding and language modeling with recurrent neural networks. arXiv preprint arXiv:1609.01462 (2016)
Luan, Y., Watanabe, S., Harsham, B.: Efficient learning for spoken language understanding tasks with word embedding based pre-training. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Pedersen, T.: A simple approach to building ensembles of naive Bayesian classifiers for word sense disambiguation. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference, pp. 63–69. Association for Computational Linguistics (2000)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Price, P.J.: Evaluation of spoken language systems: the ATIS domain. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Ravuri, S., Stoicke, A.: A comparative study of neural network models for lexical intent classification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 368–374. IEEE (2015)
Ravuri, S.V., Stolcke, A.: Recurrent neural network and LSTM models for lexical utterance classification. In: INTERSPEECH, pp. 135–139 (2015)
Sang, E.F.: Noun phrase recognition by system combination. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 50–55. Association for Computational Linguistics (2000)
Sarikaya, R., Hinton, G.E., Ramabhadran, B.: Deep belief nets for natural language call-routing. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5683. IEEE (2011)
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sutton, C., McCallum, A., et al.: An introduction to conditional random fields. Found. Trends® Mach. Learn. 4(4), 267–373 (2012)
Tur, G.: Model adaptation for spoken language understanding. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 1, p. I-41. IEEE (2005)
Tur, G., Hakkani-Tür, D., Heck, L., Parthasarathy, S.: Sentence simplification for spoken language understanding. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5628–5631. IEEE (2011)
Van Halteren, H., Zavrel, J., Daelemans, W.: Improving data driven wordclass tagging by system combination. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, pp. 491–497. Association for Computational Linguistics (1998)
Welch, B.L.: The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika 34(1/2), 28–35 (1947)
Wu, J.: Introduction to convolutional neural networks. National Key Lab for Novel Software Technology, Nanjing University, China (2017)
Xu, P., Sarikaya, R.: Convolutional neural network based triangular CRF for joint intent detection and slot filling. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 78–83. IEEE (2013)
Zhang, X., Wang, H.: A joint model of intent determination and slot filling for spoken language understanding. In: IJCAI, pp. 2993–2999 (2016)
Acknowledgment
The research reported in this paper is partially supported by Accenture IIT AI Lab, IIT Patna.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Firdaus, M., Bhatnagar, S., Ekbal, A., Bhattacharyya, P. (2018). Intent Detection for Spoken Language Understanding Using a Deep Ensemble Model. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11012. Springer, Cham. https://doi.org/10.1007/978-3-319-97304-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-97304-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97303-6
Online ISBN: 978-3-319-97304-3
eBook Packages: Computer ScienceComputer Science (R0)