Skip to main content
Log in

Exploring end-to-end framework towards Khasi speech recognition system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bronson Syiem.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Syiem, B., Singh, L.J. Exploring end-to-end framework towards Khasi speech recognition system. Int J Speech Technol 24, 419–424 (2021). https://doi.org/10.1007/s10772-021-09811-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09811-5

Keywords

Navigation