Soft-computation based speech recognition system for Sylheti language

Chakraborty, Gautam; Sharma, Mridusmita; Saikia, Navajit; Sarma, Kandarpa Kumar

doi:10.1007/s10772-022-09976-7

Soft-computation based speech recognition system for Sylheti language

Published: 25 May 2022

Volume 25, pages 499–509, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Gautam Chakraborty ORCID: orcid.org/0000-0002-1587-6924¹,
Mridusmita Sharma²,
Navajit Saikia¹ &
…
Kandarpa Kumar Sarma²

174 Accesses
3 Altmetric
Explore all metrics

Abstract

The encouraging trend of usage of human machine interfaces in diverse areas has driven the evolution of Automatic Speech Recognition (ASR) systems during last two decades. Lately, the inclination has been towards the use of machine learning techniques for under-resourced human languages primarily to focus on designing of voice activated digital tool for a sizable portion of computer illiterate speakers. A vast majority of the works in this field have employed shallow models like conventional Artificial Neural Network and Hidden Markov Model in combination with Mel Frequency Cepstral Coefficients and other relevant features for the applications of speech recognition systems. Although these shallow models are found effective, but to minimize human intervention from the approach and also to yield the better system performance, recent research has focused to incorporate deep learning models for ASR applications especially for under-resourced languages. Sylheti language, a member of Indo-Aryan language group, is an under resourced language which has more than 10 million Sylheti speakers living across the world mostly in India and Bangladesh. Focusing on the need of an ASR model for Sylheti, this work aims to design a robust ASR model for an under resourced language Sylheti by employing state-of-the-art deep learning technique Convolutional Neural Network (CNN). To find out the best and suitable ASR model for Sylheti, certain ASR approaches are formulated and trained by Sylheti isolated and connected words. The specially configured ASR model based on CNN is trained with clean, and noisy speech data which are necessary for training and making the system robust. Thereafter, a comparative analysis is presented by configuring the ASR model by some shallow models like Feed-forward neural network, Recurrent neural Network, Hidden Markov model and Time Delay neural Network. Experimental results indicate that the proposed CNN based ASR system works well for Sylheti language and the performance accuracy obtained by the system is found to be satisfactory despite the system demonstrating certain training latency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional Neural Network Based Automatic Speech Recognition for Tamil Language

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

Article 15 September 2023

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Alotaibi, Y. A., Alghamdi, M., & Alotaiby, F. (2010). Speech recognition system of Arabic alphabet based on a telephony Arabic corpus. In International conference on image and signal processing, (pp. 122–129). Springer.
Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.
Article Google Scholar
Bhardwaj, I., & Londhe, N. D. (2012). Hidden Markov model based isolated Hindi word recognition. In 2012 2nd International conference on power, control and embedded systems (pp. 1–6). IEEE.
Chakraborty, G., & Saikia, N. (2019). Speech recognition of isolated words using a new speech database in sylheti. International Journal of Recent Technology and Engineering, 8, 6259–6268.
Google Scholar
Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24, R921–R929.
Article Google Scholar
Deka, B., Dey, A., & Nirmala, S. R. (2018). Assamese connected digit recognition system. International Journal of Research in Signal Processing, Computing and Communication System Design, 4, 9–12.
Google Scholar
Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J. et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8604–8608). IEEE.
Dhanashri, D., & Dhonde, S. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Springer.
Fausett, L. V. (2006). Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India.
MATH Google Scholar
Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.
Article Google Scholar
Goldberg, Y., Hirst, G., Liu, Y., & Zhang, M. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, 44, 193–195.
Article Google Scholar
Gope, A. (2018). The phoneme inventory of Sylheti: Acoustic evidences. Journal of Advanced Linguistic Studies, 7, 7–37.
Google Scholar
Hori, T., Hori, C., Watanabe, S., & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5990–5994). IEEE.
Kimanuka, U. A., & Büyük, O. (2018). Turkish speech recognition based on deep neural networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22, 319–329.
Article Google Scholar
Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., & Stober,S. (2017). Transfer learning for speech recognition on a budget. In Proceedings of the 2nd workshop on representation learning for NLP (pp. 168–177). Association for Computational Linguistics. https://www.aclweb.org/anthology/W17-2620. https://doi.org/10.18653/v1/W17-2620.
Nagajyothi, D., & Siddaiah, P. (2018). Speech recognition using convolutional neural networks. International Journal of Engineering and Technology, 7, 133.
Article Google Scholar
Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165.
Article Google Scholar
Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32, 240–251.
Article Google Scholar
Passricha, V., & Aggarwal, R. K. (2020). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.
Article Google Scholar
Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modelling of long temporal contexts. In Sixteenth annual conference of the International Speech Communication Association.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. PTR Prentice-Hall. Inc.
Google Scholar
Sharma, M., Sarma, M., & Sarma, K. K. (2013). Recurrent neural network based approach to recognize Assamese vowels using experimentally derived acoustic-phonetic features. In 2013 1st international conference on emerging trends and applications in computer science (pp. 140–143).IEEE.
Sharma, M., & Sarma, K. K. (2016). Learning aided mood and dialect recognition using telephonic speech. In 2016 International conference on accessibility to digital world (ICADW) (pp. 163–167). IEEE.
Sharma, M., & Sarma, K. K. (2017). Soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition. CSI Transactions on ICT, 5, 209–216.
Article Google Scholar
Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145.
Sokolov, A., & Savchenko, A. V. (2019). Voice command recognition in intelligent systems using deep neural networks. In 2019 IEEE 17th world symposium on applied machine intelligence and informatics (SAMI) (pp.113–116). IEEE.
Sumon, S. A., Chowdhury, J., Debnath, S., Mohammed, N., & Momen, S.(2018). Bangla short speech commands recognition using convolutional neural networks. In 2018 International conference on Bangla speech and language processing (ICBSLP), (pp. 1–6).
Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in Amazigh recognition system. Advances in Intelligence Systems & Computing A, 1076, 533–540.
Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.
Article Google Scholar
Wang, D., Wang, X., & Lv, S. (2019). End-to-end mandarin speech recognition combining CNN and BLSTM. Symmetry, 11, 644.
Article Google Scholar
Xie, Y., Le, L., Zhou, Y., & Raghavan, V. V. (2018). Deep learning for natural language processing. In Handbook of statistics (Vol. 38, pp. 317–328). Elsevier.
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13, 55–75.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Telecommunication Engineering, Assam Engineering College, Guwahati, Assam, 781013, India
Gautam Chakraborty & Navajit Saikia
Department of Electronics and Communication Engineering, Gauhati University, Guwahati, Assam, 781014, India
Mridusmita Sharma & Kandarpa Kumar Sarma

Authors

Gautam Chakraborty
View author publications
You can also search for this author inPubMed Google Scholar
Mridusmita Sharma
View author publications
You can also search for this author inPubMed Google Scholar
Navajit Saikia
View author publications
You can also search for this author inPubMed Google Scholar
Kandarpa Kumar Sarma
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Gautam Chakraborty.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakraborty, G., Sharma, M., Saikia, N. et al. Soft-computation based speech recognition system for Sylheti language. Int J Speech Technol 25, 499–509 (2022). https://doi.org/10.1007/s10772-022-09976-7

Download citation

Received: 23 December 2020
Accepted: 02 May 2022
Published: 25 May 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10772-022-09976-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Soft-computation based speech recognition system for Sylheti language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolutional Neural Network Based Automatic Speech Recognition for Tamil Language

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now