Abstract
This paper proposes a robust model for domain recognition of acoustic communication by using Bidirectional LSTM and deep neural network. The proposed model consists of five layers namely: speech recognition, word embedding, a layer of Bidirectional LSTM (BiLSTM) followed by two fully connected layers (FC). Initially, speech is recognized and resultant text is preprocessed before passing to the proposed model to obtain the domain of communication. Word embedding takes the padded sentence as the input sequence and outputs the encoded sentence. LSTM layer is used to capture the temporal features while fully connected layers (FC) are responsible to capture the linear and nonlinear combination of those features. We compared the performance of our proposed model to the conventional machine learning algorithms such as SVM, KNN, Random forest, and Gradient boosting and found that proposed model outperforms with high accuracy, i.e., 90.09%.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barchiesi D, Giannoulis DD, Stowell D, Plumbley MD (2015) Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process Mag. https://doi.org/10.1109/MSP.2014.2326181
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42. https://doi.org/10.1109/TASL.2011.2134090
Feizollah A, Ainin S, Anuar NB, Abdullah NAB, Hazim M (2019) Halal products on twitter: data extraction and sentiment analysis using stack of deep learning algorithms. IEEE Access 7:83354–83362. https://doi.org/10.1109/ACCESS.2019.2923275
Ford L, Tang H, Grondin F, Glass J (2019) A deep residual network for large-scale acoustic scene analysis. Proceedings of the annual conference of the international speech communication association, INTERSPEECH, 2019-Sept, 2568–2572. https://doi.org/https://doi.org/10.21437/Interspeech.2019-2731
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. linguistic data consortium, Philadelphia
Hayashi T, Watanabe S, Toda T, Hori T, Le Roux J, Takeda K (2016) Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Retrieved from http://www.merl.com
Huang, C.-W., & Narayanan, S. S. (2017). Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition. 1–19. Retrieved from http://arxiv.org/abs/1706.02901
Ma J, Tang H, Zheng WL, Lu BL (2019) Emotion recognition using multimodal residual LSTM network title. ACM. https://doi.org/10.1145/3343031.3350871
Johnson R, Zhang T (2016) Supervised and semi-supervised text categorization using LSTM for region embeddings
Li B, York N (2019). Acoustic and lexical sentiment analysis for customer service calls department of computer science dimitrios dimitriadis, Andreas stolcke speech and dialog research group. 5876–5880
Liu AA, Shao Z, Wong Y, Li J, Su YT, Kankanhalli M (2019) LSTM-based multi-label video event detection. Multimed Tools Appl 78(1):677–695
Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Information Fusion 51:10–18. https://doi.org/10.1016/j.inffus.2018.10.009
Muhammad MSH (2019) Emotion recognition using deep learning approach from audio–visual emotional big data. Inf Fusion 49:69–78. https://doi.org/10.1016/j.inffus.2018.09.008
Mun S, Park S, Han DK, Ko H, University K (2017) Detection and classification of acoustic scenes and events 2017 generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane. http://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Mun_213.pdf. Retrieved from Nov
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. Acoust Speech Signal Process (ICASSP) 5206–5210
Liu Y, Peng H, Guo J, He T, Li X, Song Y, Li J (2018) Event detection and evolution based on knowledge base. In: Proceedings of KBCOM2. Los Angeles, California, USA, pp 1–7
Rathor S, Jadon RS (2019) Acoustic domain classification and recognition through ensemble based multilevel classification. J Ambient Intell Humaniz Comput 10(9):3617–3627
Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed 17(10):1733–1746. https://doi.org/10.1109/TMM.2015.2428998
Valenti M, Diment A, Parascandolo G, Squartini S, Virtanen T (2016) Acoustic scene classification using convolutional neural networks. Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), 95–99. September
Vafeiadis A, Votis K, Giakoumis D, Tzovaras D, Chen L, Hamzaoui R (2020) Audio content analysis for unobtrusive event detection in smart homes. Eng Appl Artif Intell 89:103226
Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Cvpr’ 16. https://doi.org/10.1109/CVPR.2016.597
Zhang S, Zhao X, Tian Q (2019) Spontaneous speech emotion recognition using multiscale deep convolutional LSTM. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2947464
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rathor, S., Agrawal, S. A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.. Neural Comput & Applic 33, 11223–11232 (2021). https://doi.org/10.1007/s00521-020-05569-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05569-0