Exploring end-to-end framework towards Khasi speech recognition system

Syiem, Bronson; Singh, L. Joyprakash

doi:10.1007/s10772-021-09811-5

Exploring end-to-end framework towards Khasi speech recognition system

Published: 27 January 2021

Volume 24, pages 419–424, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

230 Accesses
3 Citations
Explore all metrics

Abstract

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid end-to-end model for Kazakh speech recognition

Article 02 August 2022

An end-to-end continuous Kannada ASR system under uncontrolled environment

Article 13 June 2023

Study of Speech Recognition System Based on Transformer and Connectionist Temporal Classification Models for Low Resource Language

References

Amodei, D., et al. (2016). Deep speech 2: End-to-End speech recognition in English and Mandarin. In Proceedings of the 33rd international conference on machine learning (Vol. 48, pp. 173–182).
Bachate, R. P., & Sharma, A. (2019). Automatic speech recognition systems for regional languages in India. International Journal of Recent Technology and Engineering, 8, 585–592.
Google Scholar
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2016.7472621.
Article Google Scholar
Escur i Gelabert, J. (2017). Exploring automatic speech recognition with TensorFlow (pp. 1–36). Degree thesis.
Guglani, J., & Mishra, A. N. (2018). Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. International Journal of Speech Technology, 21, 211–216.
Article Google Scholar
Hannun, A., et al. (2014). Deep speech: Scaling up End-to-End speech recognition (pp. 1–12). arxiv.org/abs/1412.5567.
Hori, T., Watanabe, S., Zhang, Y., & Chan, W. (2017). Advances in joint CTC-attention based End-to-End speech recognition with a deep CNN encoder and RNN-LM. Interspeech. https://doi.org/10.21437/Interspeech.2017-1296.
Article Google Scholar
Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based End-to-End speech recognition using multi-task learning. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2017.7953075,4835-4839.
Article Google Scholar
Kurata, G., & Audhkhasi, K. (2018). Improved knowledge distillation from bi-directional to uni-directional LSTM CTC for End-to-End speech recognition. IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/SLT.2018.8639629.
Article Google Scholar
Li, J., et al. (2019). Jasper: An End-to-End convolutional neural acoustic model. Interspeech. https://doi.org/10.21437/Interspeech.2019-1819.
Article Google Scholar
Miao, Y., Gowayyed, M., & Metze, F. (2015). EESEN: End-to-End speech recognition using deep RNN models and WFST-based decoding. IEEE Workshop on Automatic Speech Recognition and Understanding. https://doi.org/10.1109/ASRU.2015.7404790.
Article Google Scholar
Park, D. S., et al. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. Interspeech. https://doi.org/10.21437/Interspeech.2019-2680.
Article Google Scholar
Renkens, V. Retrieved November 21, 2019, from https://www.github.com/vrenkens/nabu.
Shan, C., et al. (2019). Investigating End-to-End speech recognition for Mandarin-English code-switching. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2019.8682850.
Article Google Scholar
Shan, C., Zhang, J., Wang, Y., & Xie, L. (2018). Attention-based End-to-End speech recognition on voice search. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2018.8462492.
Article Google Scholar
Sumit, S. H., Al Muntasir, T., Zaman, M. A., Nandi, R. N., & Sourov, T. (2018). Noise Robust End-to-End speech recognition for Bangla language. International Conference on Bangla Speech and Language Processing (ICBSLP). https://doi.org/10.1109/ICBSLP.2018.8554871.
Article Google Scholar
Watanabe, S. (2017). Hybrid CTC/attention architecture for End-to-End speech recognition. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1240–1253.
Article Google Scholar
Zeyer, A., Irie, K., Schluter, R., & Ney, H. (2018). Improved training of End-to-End attention models for speech recognition. Interspeech. https://doi.org/10.21437/Interspeech.2018-1616.
Article Google Scholar
Zhang, Y., et al. (2016). Towards End-to-End speech recognition with deep convolutional neural networks. International Conference on Intelligent Robotics and Applications. https://doi.org/10.21437/Interspeech.2016-1446.
Article Google Scholar
Zhang, Y., Chan, W., & Jaitly, N. (2017). Very deep convolutional networks for End-to-End speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2017.7953077.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electronics & Communication Engineering, NEHU, Shillong, Meghalaya, 793022, India
Bronson Syiem & L. Joyprakash Singh

Authors

Bronson Syiem
View author publications
You can also search for this author in PubMed Google Scholar
L. Joyprakash Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bronson Syiem.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Syiem, B., Singh, L.J. Exploring end-to-end framework towards Khasi speech recognition system. Int J Speech Technol 24, 419–424 (2021). https://doi.org/10.1007/s10772-021-09811-5

Download citation

Received: 16 March 2020
Accepted: 08 November 2020
Published: 27 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10772-021-09811-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring end-to-end framework towards Khasi speech recognition system

Abstract

Access this article

Similar content being viewed by others

Hybrid end-to-end model for Kazakh speech recognition

An end-to-end continuous Kannada ASR system under uncontrolled environment

Study of Speech Recognition System Based on Transformer and Connectionist Temporal Classification Models for Low Resource Language

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring end-to-end framework towards Khasi speech recognition system

Abstract

Access this article

Similar content being viewed by others

Hybrid end-to-end model for Kazakh speech recognition

An end-to-end continuous Kannada ASR system under uncontrolled environment

Study of Speech Recognition System Based on Transformer and Connectionist Temporal Classification Models for Low Resource Language

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation