Skip to main content

Advertisement

Log in

Soft-computation based speech recognition system for Sylheti language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The encouraging trend of usage of human machine interfaces in diverse areas has driven the evolution of Automatic Speech Recognition (ASR) systems during last two decades. Lately, the inclination has been towards the use of machine learning techniques for under-resourced human languages primarily to focus on designing of voice activated digital tool for a sizable portion of computer illiterate speakers. A vast majority of the works in this field have employed shallow models like conventional Artificial Neural Network and Hidden Markov Model in combination with Mel Frequency Cepstral Coefficients and other relevant features for the applications of speech recognition systems. Although these shallow models are found effective, but to minimize human intervention from the approach and also to yield the better system performance, recent research has focused to incorporate deep learning models for ASR applications especially for under-resourced languages. Sylheti language, a member of Indo-Aryan language group, is an under resourced language which has more than 10 million Sylheti speakers living across the world mostly in India and Bangladesh. Focusing on the need of an ASR model for Sylheti, this work aims to design a robust ASR model for an under resourced language Sylheti by employing state-of-the-art deep learning technique Convolutional Neural Network (CNN). To find out the best and suitable ASR model for Sylheti, certain ASR approaches are formulated and trained by Sylheti isolated and connected words. The specially configured ASR model based on CNN is trained with clean, and noisy speech data which are necessary for training and making the system robust. Thereafter, a comparative analysis is presented by configuring the ASR model by some shallow models like Feed-forward neural network, Recurrent neural Network, Hidden Markov model and Time Delay neural Network. Experimental results indicate that the proposed CNN based ASR system works well for Sylheti language and the performance accuracy obtained by the system is found to be satisfactory despite the system demonstrating certain training latency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Alotaibi, Y. A., Alghamdi, M., & Alotaiby, F. (2010). Speech recognition system of Arabic alphabet based on a telephony Arabic corpus. In International conference on image and signal processing, (pp. 122–129). Springer.

  • Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.

    Article  Google Scholar 

  • Bhardwaj, I., & Londhe, N. D. (2012). Hidden Markov model based isolated Hindi word recognition. In 2012 2nd International conference on power, control and embedded systems (pp. 1–6). IEEE.

  • Chakraborty, G., & Saikia, N. (2019). Speech recognition of isolated words using a new speech database in sylheti. International Journal of Recent Technology and Engineering, 8, 6259–6268.

    Google Scholar 

  • Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24, R921–R929.

    Article  Google Scholar 

  • Deka, B., Dey, A., & Nirmala, S. R. (2018). Assamese connected digit recognition system. International Journal of Research in Signal Processing, Computing and Communication System Design, 4, 9–12.

    Google Scholar 

  • Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J. et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8604–8608). IEEE.

  • Dhanashri, D., & Dhonde, S. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Springer.

  • Fausett, L. V. (2006). Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India.

    MATH  Google Scholar 

  • Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.

    Article  Google Scholar 

  • Goldberg, Y., Hirst, G., Liu, Y., & Zhang, M. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, 44, 193–195.

    Article  Google Scholar 

  • Gope, A. (2018). The phoneme inventory of Sylheti: Acoustic evidences. Journal of Advanced Linguistic Studies, 7, 7–37.

    Google Scholar 

  • Hori, T., Hori, C., Watanabe, S., & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5990–5994). IEEE.

  • Kimanuka, U. A., & Büyük, O. (2018). Turkish speech recognition based on deep neural networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22, 319–329.

    Article  Google Scholar 

  • Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., & Stober,S. (2017). Transfer learning for speech recognition on a budget. In Proceedings of the 2nd workshop on representation learning for NLP (pp. 168–177). Association for Computational Linguistics. https://www.aclweb.org/anthology/W17-2620. https://doi.org/10.18653/v1/W17-2620.

  • Nagajyothi, D., & Siddaiah, P. (2018). Speech recognition using convolutional neural networks. International Journal of Engineering and Technology, 7, 133.

    Article  Google Scholar 

  • Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165.

    Article  Google Scholar 

  • Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32, 240–251.

    Article  Google Scholar 

  • Passricha, V., & Aggarwal, R. K. (2020). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.

    Article  Google Scholar 

  • Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modelling of long temporal contexts. In Sixteenth annual conference of the International Speech Communication Association.

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. PTR Prentice-Hall. Inc.

    Google Scholar 

  • Sharma, M., Sarma, M., & Sarma, K. K. (2013). Recurrent neural network based approach to recognize Assamese vowels using experimentally derived acoustic-phonetic features. In 2013 1st international conference on emerging trends and applications in computer science (pp. 140–143).IEEE.

  • Sharma, M., & Sarma, K. K. (2016). Learning aided mood and dialect recognition using telephonic speech. In 2016 International conference on accessibility to digital world (ICADW) (pp. 163–167). IEEE.

  • Sharma, M., & Sarma, K. K. (2017). Soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition. CSI Transactions on ICT, 5, 209–216.

    Article  Google Scholar 

  • Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145.

  • Sokolov, A., & Savchenko, A. V. (2019). Voice command recognition in intelligent systems using deep neural networks. In 2019 IEEE 17th world symposium on applied machine intelligence and informatics (SAMI) (pp.113–116). IEEE.

  • Sumon, S. A., Chowdhury, J., Debnath, S., Mohammed, N., & Momen, S.(2018). Bangla short speech commands recognition using convolutional neural networks. In 2018 International conference on Bangla speech and language processing (ICBSLP), (pp. 1–6).

  • Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in Amazigh recognition system. Advances in Intelligence Systems & Computing A, 1076, 533–540.

    Google Scholar 

  • Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.

    Article  Google Scholar 

  • Wang, D., Wang, X., & Lv, S. (2019). End-to-end mandarin speech recognition combining CNN and BLSTM. Symmetry, 11, 644.

    Article  Google Scholar 

  • Xie, Y., Le, L., Zhou, Y., & Raghavan, V. V. (2018). Deep learning for natural language processing. In Handbook of statistics (Vol. 38, pp. 317–328). Elsevier.

  • Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13, 55–75.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautam Chakraborty.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakraborty, G., Sharma, M., Saikia, N. et al. Soft-computation based speech recognition system for Sylheti language. Int J Speech Technol 25, 499–509 (2022). https://doi.org/10.1007/s10772-022-09976-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09976-7

Keywords