Skip to main content

A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network

  • Conference paper
  • First Online:
Machine Intelligence and Emerging Technologies (MIET 2022)

Abstract

Automatic speech recognition (ASR) converts human speech into text or words that can be understood and classified easily. Only digits from (0-9) were used in the few studies on Bangla number recognition systems, which completely ignored duo-syllabic and tri-syllabic numbers. Audio samples of (0-99) Bangla spoken numbers from Bangladeshi citizens of various genders, ages, and languages were used to construct a speech dataset of spoken numbers in this work. Time shift, speed tuning, background noise mixing, and volume tuning are among the audio augmentation techniques used on the raw speech data. Then, to extract meaningful features from the data, Mel Frequency Cepstrum Coefficients (MFCCs) are used. This research developed a Bangla number recognition system based on Convolutional Neural Networks (CNNs). Our proposed dataset includes the diversity of speakers in terms of age, gender, dialects and other criteria. The proposed method recognizes (0-99) Bangla spoken numbers with 89.61% accuracy across the entire dataset. The model’s efficacy was also evaluated using a 10-fold cross-validation procedure, with 89.74% accuracy for recognizing (0-99) Bangla spoken numbers across the entire dataset. This proposed method is also compared to some existing works in the field of recognizing spoken digits classes, demonstrating its dominance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 12 benefits of speech to text. https://www.dataworxs.com.au/12-benefits-speech-text. Accessed30 June 2021

  2. Mel frequency cepstral coefficient (MFCC) tutorial. http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs. Accessed 28 June 2021

  3. Mel-frequency cepstrum. https://en.wikipedia.org/wiki/Mel-frequencycepstrum. Accessed 28 June 2021

  4. speech recognition. https://searchcustomerexperience.techtarget.com/definition/speech-recognition. Accessed 30 June 2021

  5. Ahammad, K., Rahman, M.M.: Connected Bangla speech recognition using artificial neural network. Int. J. Comput. Appl. 149(9), 38–41 (2016)

    Google Scholar 

  6. Ahmed, T., Wahid, M.F., Habib, M.A.: Implementation of Bangla speech recognition in voice input speech output (viso) calculator. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–5. IEEE (2018)

    Google Scholar 

  7. Ali, M.A., Hossain, M., Bhuiyan, M.N., et al.: Automatic speech recognition technique for bangla words. Int. J. Adv. Sci. Technol. 50 (2013)

    Google Scholar 

  8. Chung, T.D., Drieberg, M., Hassan, M.F.B., Khalyasmaa, A.: End-to-end conversion speed analysis of an FPT. AI-based text-to-speech application. In: 2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech), pp. 136–139. IEEE (2020)

    Google Scholar 

  9. Gales, M., Young, S.: The application of hidden Markov models in speech recognition (2008)

    Google Scholar 

  10. Graves, A., Beringer, N., Schmidhuber, J.: A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition. In: The 23rd IASTED International Conference on Modelling, Identification, and Control (2004)

    Google Scholar 

  11. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Article  Google Scholar 

  12. Hasnat, M., Mowla, J., Khan, M., et al.: Isolated and continuous Bangla speech recognition: implementation, performance and application perspective (2007)

    Google Scholar 

  13. Hossain, S., Rahman, M., Ahmed, F., Dewan, M.: Bangla speech synthesis, analysis, and recognition: an overview. Proc, NCCPB (2004)

    Google Scholar 

  14. Islam, J., Mubassira, M., Islam, M.R., Das, A.K.: A speech recognition system for Bengali language using recurrent neural network. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 73–76. IEEE (2019)

    Google Scholar 

  15. Liu, T., Fang, S., Zhao, Y., Wang, P., Zhang, J.: Implementation of training convolutional neural networks. arXiv preprint arXiv:1506.01195 (2015)

  16. Muhammad, G., Alotaibi, Y.A., Huda, M.N.: Automatic speech recognition for bangla digits. In: 2009 12th International Conference on Computers and Information Technology, pp. 379–383. IEEE (2009)

    Google Scholar 

  17. Nahid, M.M.H., Purkaystha, B., Islam, M.S.: Bengali speech recognition: a double layered LSTM-RNN approach. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2017)

    Google Scholar 

  18. Netshiombo, D., Mokgonyane, T.B., Manamela, M.J., Modipa, T.I.: Spoken digit recognition system for an extremely under-resourced language

    Google Scholar 

  19. Park, D.S., et al.: Specaugment: a simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)

  20. Paul, A.K., Das, D., Kamal, M.M.: Bangla speech recognition system using LPC and ANN. In: 2009 Seventh International Conference on Advances in Pattern Recognition, pp. 171–174. IEEE (2009)

    Google Scholar 

  21. Paul, B., Bera, S., Paul, R., Phadikar, S.: Bengali spoken numerals recognition by MFCC and GMM technique. In: Mallick, P.K., Bhoi, A.K., Chae, G.-S., Kalita, K. (eds.) Advances in Electronics, Communication and Computing. LNEE, vol. 709, pp. 85–96. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-8752-8_9

    Chapter  Google Scholar 

  22. Reddy, D.R.: Speech recognition by machine: a review. Proc. IEEE 64(4), 501–531 (1976)

    Article  Google Scholar 

  23. Renjith, S., Joseph, A., Anish Babu K.K.: Isolated digit recognition for Malayalam-an application perspective. In: 2013 International Conference on Control Communication and Computing (ICCC), pp. 190–193. IEEE (2013)

    Google Scholar 

  24. Saxena, B., Wahi, C.: Hindi digits recognition system on speech data collected in different natural noise environments. In: International Conference on Computer Science, Engineering and Information Technology (CSITY 2015) February, pp. 14–15 (2015)

    Google Scholar 

  25. Sen, O., et al.: Bangla natural language processing: a comprehensive analysis of classical, machine learning, and deep learning based methods. IEEE Access 10, 38999–39044 (2022)

    Article  Google Scholar 

  26. Sen, O., Roy, P., et al.: A convolutional neural network based approach to recognize Bangla spoken digits from speech signal. In: 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), pp. 1–4. IEEE (2021)

    Google Scholar 

  27. Sharmin, R., Rahut, S.K., Huq, M.R.: Bengali spoken digit classification: a deep learning approach using convolutional neural network. Procedia Comput. Sci. 171, 1381–1388 (2020)

    Article  Google Scholar 

  28. Shuvo, M., Shahriyar, S.A., Akhand, M.: Bangla numeral recognition from speech signal using convolutional neural network. In: 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–4. IEEE (2019)

    Google Scholar 

  29. Sultana, S., Akhand, M., Das, P.K., Rahman, M.H.: Bangla speech-to-text conversion using SAPI. In: 2012 International Conference on Computer and Communication Engineering (ICCCE), pp. 385–390. IEEE (2012)

    Google Scholar 

  30. Sumit, S.H., Al Muntasir, T., Zaman, M.A., Nandi, R.N., Sourov, T.: Noise robust end-to-end speech recognition for Bangla language. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–5. IEEE (2018)

    Google Scholar 

  31. Sumon, S.A., Chowdhury, J., Debnath, S., Mohammed, N., Momen, S.: Bangla short speech commands recognition using convolutional neural networks. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–6. IEEE (2018)

    Google Scholar 

  32. Taufik, D., Hanafiah, N.: Autovat: an automated visual acuity test using spoken digit recognition with MEL frequency cepstral coefficients and convolutional neural network. Procedia Comput. Sci. 179, 458–467 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ovishake Sen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sen, O., Roy, P., Al-Mahmud (2023). A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network. In: Satu, M.S., Moni, M.A., Kaiser, M.S., Arefin, M.S. (eds) Machine Intelligence and Emerging Technologies. MIET 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 490. Springer, Cham. https://doi.org/10.1007/978-3-031-34619-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34619-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34618-7

  • Online ISBN: 978-3-031-34619-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics