A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.

Rathor, Sandeep; Agrawal, Sanket

doi:10.1007/s00521-020-05569-0

A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.

Original Article
Published: 11 January 2021

Volume 33, pages 11223–11232, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Sandeep Rathor¹ &
Sanket Agrawal²

476 Accesses
31 Citations
Explore all metrics

Abstract

This paper proposes a robust model for domain recognition of acoustic communication by using Bidirectional LSTM and deep neural network. The proposed model consists of five layers namely: speech recognition, word embedding, a layer of Bidirectional LSTM (BiLSTM) followed by two fully connected layers (FC). Initially, speech is recognized and resultant text is preprocessed before passing to the proposed model to obtain the domain of communication. Word embedding takes the padded sentence as the input sequence and outputs the encoded sentence. LSTM layer is used to capture the temporal features while fully connected layers (FC) are responsible to capture the linear and nonlinear combination of those features. We compared the performance of our proposed model to the conventional machine learning algorithms such as SVM, KNN, Random forest, and Gradient boosting and found that proposed model outperforms with high accuracy, i.e., 90.09%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

Article 15 September 2023

Hindi Speech Recognition Using Deep Learning: A Review

End-to-End Acoustic Model Using 1D CNN and BLSTM Networks with Focal CTC Loss

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Barchiesi D, Giannoulis DD, Stowell D, Plumbley MD (2015) Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process Mag. https://doi.org/10.1109/MSP.2014.2326181
Article Google Scholar
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42. https://doi.org/10.1109/TASL.2011.2134090
Article Google Scholar
Feizollah A, Ainin S, Anuar NB, Abdullah NAB, Hazim M (2019) Halal products on twitter: data extraction and sentiment analysis using stack of deep learning algorithms. IEEE Access 7:83354–83362. https://doi.org/10.1109/ACCESS.2019.2923275
Article Google Scholar
Ford L, Tang H, Grondin F, Glass J (2019) A deep residual network for large-scale acoustic scene analysis. Proceedings of the annual conference of the international speech communication association, INTERSPEECH, 2019-Sept, 2568–2572. https://doi.org/https://doi.org/10.21437/Interspeech.2019-2731
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. linguistic data consortium, Philadelphia
Hayashi T, Watanabe S, Toda T, Hori T, Le Roux J, Takeda K (2016) Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Retrieved from http://www.merl.com
Huang, C.-W., & Narayanan, S. S. (2017). Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition. 1–19. Retrieved from http://arxiv.org/abs/1706.02901
Ma J, Tang H, Zheng WL, Lu BL (2019) Emotion recognition using multimodal residual LSTM network title. ACM. https://doi.org/10.1145/3343031.3350871
Article Google Scholar
Johnson R, Zhang T (2016) Supervised and semi-supervised text categorization using LSTM for region embeddings
Li B, York N (2019). Acoustic and lexical sentiment analysis for customer service calls department of computer science dimitrios dimitriadis, Andreas stolcke speech and dialog research group. 5876–5880
Liu AA, Shao Z, Wong Y, Li J, Su YT, Kankanhalli M (2019) LSTM-based multi-label video event detection. Multimed Tools Appl 78(1):677–695
Article Google Scholar
Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Information Fusion 51:10–18. https://doi.org/10.1016/j.inffus.2018.10.009
Article Google Scholar
Muhammad MSH (2019) Emotion recognition using deep learning approach from audio–visual emotional big data. Inf Fusion 49:69–78. https://doi.org/10.1016/j.inffus.2018.09.008
Article Google Scholar
Mun S, Park S, Han DK, Ko H, University K (2017) Detection and classification of acoustic scenes and events 2017 generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane. http://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Mun_213.pdf. Retrieved from Nov
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. Acoust Speech Signal Process (ICASSP) 5206–5210
Liu Y, Peng H, Guo J, He T, Li X, Song Y, Li J (2018) Event detection and evolution based on knowledge base. In: Proceedings of KBCOM2. Los Angeles, California, USA, pp 1–7
Rathor S, Jadon RS (2019) Acoustic domain classification and recognition through ensemble based multilevel classification. J Ambient Intell Humaniz Comput 10(9):3617–3627
Article Google Scholar
Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed 17(10):1733–1746. https://doi.org/10.1109/TMM.2015.2428998
Article Google Scholar
Valenti M, Diment A, Parascandolo G, Squartini S, Virtanen T (2016) Acoustic scene classification using convolutional neural networks. Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), 95–99. September
Vafeiadis A, Votis K, Giakoumis D, Tzovaras D, Chen L, Hamzaoui R (2020) Audio content analysis for unobtrusive event detection in smart homes. Eng Appl Artif Intell 89:103226
Article Google Scholar
Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Cvpr’ 16. https://doi.org/10.1109/CVPR.2016.597
Article Google Scholar
Zhang S, Zhao X, Tian Q (2019) Spontaneous speech emotion recognition using multiscale deep convolutional LSTM. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2947464
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CEA, GLA University, Mathura, India
Sandeep Rathor
Infosys Ltd, Pune, India
Sanket Agrawal

Authors

Sandeep Rathor
View author publications
You can also search for this author inPubMed Google Scholar
Sanket Agrawal
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sandeep Rathor.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rathor, S., Agrawal, S. A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.. Neural Comput & Applic 33, 11223–11232 (2021). https://doi.org/10.1007/s00521-020-05569-0

Download citation

Received: 05 February 2020
Accepted: 01 December 2020
Published: 11 January 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00521-020-05569-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

Hindi Speech Recognition Using Deep Learning: A Review

End-to-End Acoustic Model Using 1D CNN and BLSTM Networks with Focal CTC Loss

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now