Skip to main content

Advertisement

Log in

A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a robust model for domain recognition of acoustic communication by using Bidirectional LSTM and deep neural network. The proposed model consists of five layers namely: speech recognition, word embedding, a layer of Bidirectional LSTM (BiLSTM) followed by two fully connected layers (FC). Initially, speech is recognized and resultant text is preprocessed before passing to the proposed model to obtain the domain of communication. Word embedding takes the padded sentence as the input sequence and outputs the encoded sentence. LSTM layer is used to capture the temporal features while fully connected layers (FC) are responsible to capture the linear and nonlinear combination of those features. We compared the performance of our proposed model to the conventional machine learning algorithms such as SVM, KNN, Random forest, and Gradient boosting and found that proposed model outperforms with high accuracy, i.e., 90.09%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Barchiesi D, Giannoulis DD, Stowell D, Plumbley MD (2015) Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process Mag. https://doi.org/10.1109/MSP.2014.2326181

    Article  Google Scholar 

  2. Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42. https://doi.org/10.1109/TASL.2011.2134090

    Article  Google Scholar 

  3. Feizollah A, Ainin S, Anuar NB, Abdullah NAB, Hazim M (2019) Halal products on twitter: data extraction and sentiment analysis using stack of deep learning algorithms. IEEE Access 7:83354–83362. https://doi.org/10.1109/ACCESS.2019.2923275

    Article  Google Scholar 

  4. Ford L, Tang H, Grondin F, Glass J (2019) A deep residual network for large-scale acoustic scene analysis. Proceedings of the annual conference of the international speech communication association, INTERSPEECH, 2019-Sept, 2568–2572. https://doi.org/https://doi.org/10.21437/Interspeech.2019-2731

  5. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. linguistic data consortium, Philadelphia

  6. Hayashi T, Watanabe S, Toda T, Hori T, Le Roux J, Takeda K (2016) Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection. Retrieved from http://www.merl.com

  7. Huang, C.-W., & Narayanan, S. S. (2017). Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition. 1–19. Retrieved from http://arxiv.org/abs/1706.02901

  8. Ma J, Tang H, Zheng WL, Lu BL (2019) Emotion recognition using multimodal residual LSTM network title. ACM. https://doi.org/10.1145/3343031.3350871

    Article  Google Scholar 

  9. Johnson R, Zhang T (2016) Supervised and semi-supervised text categorization using LSTM for region embeddings

  10. Li B, York N (2019). Acoustic and lexical sentiment analysis for customer service calls department of computer science dimitrios dimitriadis, Andreas stolcke speech and dialog research group. 5876–5880

  11. Liu AA, Shao Z, Wong Y, Li J, Su YT, Kankanhalli M (2019) LSTM-based multi-label video event detection. Multimed Tools Appl 78(1):677–695

    Article  Google Scholar 

  12. Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emotion recognition using deep belief network architecture. Information Fusion 51:10–18. https://doi.org/10.1016/j.inffus.2018.10.009

    Article  Google Scholar 

  13. Muhammad MSH (2019) Emotion recognition using deep learning approach from audio–visual emotional big data. Inf Fusion 49:69–78. https://doi.org/10.1016/j.inffus.2018.09.008

    Article  Google Scholar 

  14. Mun S, Park S, Han DK, Ko H, University K (2017) Detection and classification of acoustic scenes and events 2017 generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane. http://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Mun_213.pdf. Retrieved from Nov

  15. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. Acoust Speech Signal Process (ICASSP) 5206–5210

  16. Liu Y, Peng H, Guo J, He T, Li X, Song Y, Li J (2018) Event detection and evolution based on knowledge base. In: Proceedings of KBCOM2. Los Angeles, California, USA, pp 1–7

  17. Rathor S, Jadon RS (2019) Acoustic domain classification and recognition through ensemble based multilevel classification. J Ambient Intell Humaniz Comput 10(9):3617–3627

    Article  Google Scholar 

  18. Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimed 17(10):1733–1746. https://doi.org/10.1109/TMM.2015.2428998

    Article  Google Scholar 

  19. Valenti M, Diment A, Parascandolo G, Squartini S, Virtanen T (2016) Acoustic scene classification using convolutional neural networks. Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), 95–99. September

  20. Vafeiadis A, Votis K, Giakoumis D, Tzovaras D, Chen L, Hamzaoui R (2020) Audio content analysis for unobtrusive event detection in smart homes. Eng Appl Artif Intell 89:103226

    Article  Google Scholar 

  21. Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Cvpr’ 16. https://doi.org/10.1109/CVPR.2016.597

    Article  Google Scholar 

  22. Zhang S, Zhao X, Tian Q (2019) Spontaneous speech emotion recognition using multiscale deep convolutional LSTM. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2947464

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Rathor.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rathor, S., Agrawal, S. A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network.. Neural Comput & Applic 33, 11223–11232 (2021). https://doi.org/10.1007/s00521-020-05569-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05569-0

Keywords