Skip to main content

A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation

  • Conference paper
  • First Online:
Proceedings of the 2023 International Conference on Advances in Computing Research (ACR’23) (ACR 2023)

Abstract

This paper represents an approach to speech-to-text conversion in the Bengali language. In this area, we have found most of the methodologies were focused on other languages rather than Bengali. We started with a novel dataset of 56 unique words from 160 individual subjects was prepared. Then in this paper, we illustrate the approach to increasing accuracy in a speech-to-text over the Bengali language where initially we started with Gated Recurrent Unit(GRU) and Long short-term memory (LSTM) algorithms. During further observation, we found that the output of the GRU failed to give any stable output. So, we moved completely to the LSTM algorithm where we achieved 90% accuracy on an unexplored dataset. Voices of several demographic populations and noises were used to validate the model. In the testing phase, we tried a variety of classes based on their length, complexity, noise, and gender variant. Moreover, we expect that this research will help to develop a real-time Bengali speak-to-text recognition model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vanajakshi, P., Mathivanan, M.: A detailed survey on large vocabulary continuous speech recognition techniques. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, pp. 1–7 (2017). https://doi.org/10.1016/0022-2836(81)90087-5

  2. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep re-current neural networks. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 38, March 2013. https://doi.org/10.1109/ICASSP.2013.6638947

  3. Sandanalakshmi, R., Viji, P.A., Kiruthiga, M., Manjari, M., Sharina, M.: Speaker independent continuous speech to text converter for mobile application (2013). eprint: arXiv:1307.5736, https://doi.org/10.48550/arXiv.1307.5736

  4. Gupta, A., Joshi, A.: Speech recognition using artificial neural network. In: 2018 International Conference on Communication and Signal Processing (2018). https://doi.org/10.1109/ICCSP.2018.8524333

  5. Chen, Y.C., Huang, S.F., Lee, H.Y., Lee, L.S.: From semi-supervised to almost-unsupervised speech recognition with very-low resource by jointly learning phonetic structures from audio and text embeddings (2019). eprint: arXiv:1904.05078. https://doi.org/10.48550/arXiv.1904.05078

  6. Sultana, R., Palit, R.: A survey on Bengali speech-to-text recognition techniques. In: 2014 9th International Forum on Strategic Technology (IFOST) 2014, pp. 26–29 (2014). https://doi.org/10.1109/IFOST.2014.6991064

  7. Masum, A.K.M., Majedul Islam, M., Abujar, S., Sorker, A.K., Hossain, S.A.: Bengali news headline generation on the basis of sequence to sequence learning using bi-directional RNN. In: Borah, S., Pradhan, R., Dey, N., Gupta, P. (eds.) Soft Computing Techniques and Applications. AISC, vol. 1248, pp. 491–501. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7394-1_45, https://doi.org/10.1109/ICCCNT45670.2019.8944784

  8. Tausif, M.T., Chowdhury, S., Hawlader, M.S., Hasanuzzaman, M., Heickal, H.: Deep learning based bangla speech-to-text conversion. In: 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 2018, pp. 49–54 (2018)

    Google Scholar 

  9. Khatun, A., Rahman, A., Chowdhury, H.A., Islam, M.S., Tasnim, A.: A subword level language model for Bangla language. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Computational Intelligence. AIS, pp. 385–396. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3607-6_31

    Chapter  Google Scholar 

  10. Mehedy, L., Arifin, S.M.N., Kaykobad, M.: Bangla syntax analysis: a comprehensive approach, October 2020

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fahim Chowdhury .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jahan, N. et al. (2023). A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the 2023 International Conference on Advances in Computing Research (ACR’23). ACR 2023. Lecture Notes in Networks and Systems, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-031-33743-7_18

Download citation

Publish with us

Policies and ethics