A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation

Jahan, Nusrat; Sultana, Zakia; Chowdhury, Fahim; Ahmed, Sajjad; Parvez, Mohammad Zavid; Barua, Prabal Datta; Chakraborty, Subrata

doi:10.1007/978-3-031-33743-7_18

Nusrat Jahan¹¹,
Zakia Sultana¹¹,
Fahim Chowdhury¹¹,
Sajjad Ahmed¹¹,
Mohammad Zavid Parvez^12,13,14,15,
Prabal Datta Barua^16,17,18 &
…
Subrata Chakraborty¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 700))

Included in the following conference series:

International Conference on Advances in Computing Research

394 Accesses

Abstract

This paper represents an approach to speech-to-text conversion in the Bengali language. In this area, we have found most of the methodologies were focused on other languages rather than Bengali. We started with a novel dataset of 56 unique words from 160 individual subjects was prepared. Then in this paper, we illustrate the approach to increasing accuracy in a speech-to-text over the Bengali language where initially we started with Gated Recurrent Unit(GRU) and Long short-term memory (LSTM) algorithms. During further observation, we found that the output of the GRU failed to give any stable output. So, we moved completely to the LSTM algorithm where we achieved 90% accuracy on an unexplored dataset. Voices of several demographic populations and noises were used to validate the model. In the testing phase, we tried a variety of classes based on their length, complexity, noise, and gender variant. Moreover, we expect that this research will help to develop a real-time Bengali speak-to-text recognition model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vanajakshi, P., Mathivanan, M.: A detailed survey on large vocabulary continuous speech recognition techniques. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, pp. 1–7 (2017). https://doi.org/10.1016/0022-2836(81)90087-5
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep re-current neural networks. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 38, March 2013. https://doi.org/10.1109/ICASSP.2013.6638947
Sandanalakshmi, R., Viji, P.A., Kiruthiga, M., Manjari, M., Sharina, M.: Speaker independent continuous speech to text converter for mobile application (2013). eprint: arXiv:1307.5736, https://doi.org/10.48550/arXiv.1307.5736
Gupta, A., Joshi, A.: Speech recognition using artificial neural network. In: 2018 International Conference on Communication and Signal Processing (2018). https://doi.org/10.1109/ICCSP.2018.8524333
Chen, Y.C., Huang, S.F., Lee, H.Y., Lee, L.S.: From semi-supervised to almost-unsupervised speech recognition with very-low resource by jointly learning phonetic structures from audio and text embeddings (2019). eprint: arXiv:1904.05078. https://doi.org/10.48550/arXiv.1904.05078
Sultana, R., Palit, R.: A survey on Bengali speech-to-text recognition techniques. In: 2014 9th International Forum on Strategic Technology (IFOST) 2014, pp. 26–29 (2014). https://doi.org/10.1109/IFOST.2014.6991064
Masum, A.K.M., Majedul Islam, M., Abujar, S., Sorker, A.K., Hossain, S.A.: Bengali news headline generation on the basis of sequence to sequence learning using bi-directional RNN. In: Borah, S., Pradhan, R., Dey, N., Gupta, P. (eds.) Soft Computing Techniques and Applications. AISC, vol. 1248, pp. 491–501. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7394-1_45, https://doi.org/10.1109/ICCCNT45670.2019.8944784
Tausif, M.T., Chowdhury, S., Hawlader, M.S., Hasanuzzaman, M., Heickal, H.: Deep learning based bangla speech-to-text conversion. In: 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 2018, pp. 49–54 (2018)
Google Scholar
Khatun, A., Rahman, A., Chowdhury, H.A., Islam, M.S., Tasnim, A.: A subword level language model for Bangla language. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Computational Intelligence. AIS, pp. 385–396. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3607-6_31
Chapter Google Scholar
Mehedy, L., Arifin, S.M.N., Kaykobad, M.: Bangla syntax analysis: a comprehensive approach, October 2020
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh
Nusrat Jahan, Zakia Sultana, Fahim Chowdhury & Sajjad Ahmed
Information Technology, APIC, Adelaide, Australia
Mohammad Zavid Parvez
Information Technology, Torrens University, Adelaide, Australia
Mohammad Zavid Parvez
Peter Faber Business School, Australian Catholic University, Blacktown, Australia
Mohammad Zavid Parvez
School of Computing, Mathematics, and Engineering, Charles Sturt University, Bathurst, Australia
Mohammad Zavid Parvez
School of Business (Information System), University of Southern Queensland, Toowoomba, Australia
Prabal Datta Barua
School of Science and Technology, Faculty of Science, Agriculture, Business and Law, University of New England, Armidale, Australia
Prabal Datta Barua & Subrata Chakraborty
Cogninet Australia Pty Ltd, Surry Hills, Australia
Prabal Datta Barua

Authors

Nusrat Jahan
View author publications
You can also search for this author in PubMed Google Scholar
Zakia Sultana
View author publications
You can also search for this author in PubMed Google Scholar
Fahim Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Sajjad Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Zavid Parvez
View author publications
You can also search for this author in PubMed Google Scholar
Prabal Datta Barua
View author publications
You can also search for this author in PubMed Google Scholar
Subrata Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahim Chowdhury .

Editor information

Editors and Affiliations

University of Detroit Mercy, Detroit, MI, USA
Kevin Daimi
Charles Sturt University, Sydney, NSW, Australia
Abeer Al Sadoon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jahan, N. et al. (2023). A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the 2023 International Conference on Advances in Computing Research (ACR’23). ACR 2023. Lecture Notes in Networks and Systems, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-031-33743-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-33743-7_18
Published: 27 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33742-0
Online ISBN: 978-3-031-33743-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation