Acoustic model with hybrid Deep Bidirectional Single Gated Unit (DBSGU) for low resource speech recognition

Girirajan, S.; Pandian, A.

doi:10.1007/s11042-022-12723-4

Acoustic model with hybrid Deep Bidirectional Single Gated Unit (DBSGU) for low resource speech recognition

Published: 05 March 2022

Volume 81, pages 17169–17184, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

277 Accesses
1 Altmetric
Explore all metrics

Abstract

Nowadays Long Short-Term Memory RNNs (LSTM RNNs) are widely used in Automatic Speech Recognition (ASR) and achieved excellent result in the problem of vanishing gradients. Bidirectional LSTM (BLSTM) will run the inputs in two ways ,both past as well as in future that shows good performance. However implementation of BLSTM is quite difficult because of its high computational requirements and also the problem of vanishing gradients still persist, when we have multiple layer of LSTM. The extensive size of LSTM systems makes them powerless in over fitting issues. The Gated Recurrent Unit (GRU) is the latest generation recurrent neural networks with two gates. The update gate acts similar to forget and input gates of LSTM’s and reset gate responsible to decide how much previous data you should remember. GRU avoids over fitting and also training the GRU is faster compared to LSTM, since size of the GRU network is small. The proposed work is in two- fold architecture. First stage, we tend to reduce the gates in GRU by combining the reset and update gate together to form a Single Gated Unit (SGU). SGU takes half of the parameter compared with LSTM and one third of parameter compared with GRU. It increases the training speed of SGU. Second stage, SGU is combined with Deep Bidirectional design (DBSGU) to build a hybrid acoustic model that takes less number of parameters and increases the learning capability. The proposed model is compared with similarities and differences between Deep Bidirectional GRU (DBGRU) and Deep Bidirectional LSTM (DBLSTM) and found that 2 to 4% decrease in Word Error Rate (WER).The Learning rate of the is increased by 30% The entire work has been evaluated on Crowd Sourced high-quality Multi-Speaker speech (CSMS) data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Light-Gated Recurrent Unit Based Acoustic Modeling for Improved Hindi ASR

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Article Open access 17 July 2018

Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset

Article 29 October 2024

References

Abandah GA, Graves A, Al-Shagoor B, Arabiyat A, Al-Taee M (2015) Automatic diacritization of Arabic text using recurrent neural networks. Int J Doc Anal Recognit (IJDAR) 18(2):183–197
Article Google Scholar
Barman PP, Boruah A (2018) A RNN based approach for next word prediction in Assamese phonetic transcription. Procedia Comput Sci 143:117–123 (ISSN 1877 – 0509)
Article Google Scholar
Chavandan RS, Sable GS (2013) An overview of speech recognition using HMM. Int J Comput Sci Mob Comput 2(6):233–238
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv arXiv:1412.3555
Cheng G, Povey D, Huang L, Xu J, Khudanpur S, Yan Y (2018) Output-gate projected gated recurrent unit for speech recognition. Interspeech, pp 1793–1797
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610
Article Google Scholar
Graves A, Mohamed A, Hinton G (2013) Speech recognitionwith deep recurrent neural networks. In: Proc ICASSP 2013, Vancouver, Canada
Greff K, Srivastava RK, Koutnk J, Steunebrink BR, Schmidhuber J (2015) LSTM: A search space odyssey. arXiv: 1503.04069
He F, Chu SH, Kjartansson O, Rivera C, Katanova A, Gutkin A, Demirsahin I, Johny C, Jansche M, Sain S et al (2020) Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems. In: Proceedings of the 12th LREC Conference, Marseille, France, 11–16
Hochreiter S, Jürgen S (1997) Long short-term memory. Neural Comput 9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol 37, pp 2342–2350
Kim J, Kim J, Thu HLT, Kim H (2016) Long short term memory recurrent neural network classifier for intrusion detection. International Conference on Platform Technology and Service (PlatCon), Jeju, pp 1–5
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization, CoRR, vol abs/1412.6980
Kumar J, Goomer R, Singh AK (2018) Long Short Term Memory Recurrent Neural Network (LSTM-RNN) based workload forecasting model for cloud datacenters. Procedia Comput Sci 125:676–682 (ISSN 1877 – 0509)
Article Google Scholar
Kumar S, Hussain L, Banarjee S, Reza M (2018) Energy load forecasting using deep learning approach-LSTM and GRU in spark cluster. Fifth International Conference on Emerging Applications of Information Technology (EAIT), Kolkata, pp 1–4
Li X, Xianyu H, Tian J, Chen W, Meng F, Xu M et al (2016) A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction. In: IEEE International Conference in Acoustics, Speech and Signal Processing (ICASSP); Shanghai, China, p 544–548
Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: an ASR corpus based on public domain audio books. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, pp 5206–5210
Panzner M, Cimiano P (2016) Comparing hidden Markov models and long short term memory neural networks for learning action representations. In: Pardalos P, Conca P, Giuffrida G, Nicosia G (eds) Machine Learning, Optimization, and Big Data. MOD 2016, vol 10122. Springer, Cham
Google Scholar
Povey D, Ghoshal A, Boulianne G, Goel N, Hannemann M, Qian Y, Schwarz P, Stemmer G (2011) The kaldi speech recognitiontoolkit. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), Hawaii, US, pp 1–4
Ravanelli M, Brakel P, Omologo M, Bengio Y (2018) Light gated recurrent units for speech recognition. IEEE Trans Emerg Top Comput Intell 2(2):92–102
Article Google Scholar
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. Signal Process IEEE Trans 45(11):2673–2681
Article Google Scholar
Stolcke A (2002) SRILM-An extensible language modeling toolkit. In: International Conference on Spoken Language Processing (ICSLP), Denver, Colorado, pp 901–904
Thireou T, Reczko M (2007) Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins. IEEE/ACM Trans Comput Biol Bioinform 4(3):441–446
Zhang Y, Chen G, Yu D, Yao K, Khudanpur S, Glass JR (2016) Highway long short-term memory RNNS for distant speech recognition. In: Proc. of ICASSP 2016, pp 5755–5759
Zhou G-B, Wu J, Zhang C-L, Zhou Z-H (2016) Minimal gated unit for recurrent neural networks. Int J Automat Comput 13(3):226–234

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, School of Computing, SRM Institute of Science and Technology, Chennai, India
S. Girirajan & A. Pandian

Authors

S. Girirajan
View author publications
You can also search for this author inPubMed Google Scholar
A. Pandian
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to S. Girirajan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Girirajan, S., Pandian, A. Acoustic model with hybrid Deep Bidirectional Single Gated Unit (DBSGU) for low resource speech recognition. Multimed Tools Appl 81, 17169–17184 (2022). https://doi.org/10.1007/s11042-022-12723-4

Download citation

Received: 26 December 2020
Revised: 06 August 2021
Accepted: 21 February 2022
Published: 05 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12723-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic model with hybrid Deep Bidirectional Single Gated Unit (DBSGU) for low resource speech recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Light-Gated Recurrent Unit Based Acoustic Modeling for Improved Hindi ASR

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now