DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit

Guglani, Jyoti; Mishra, A. N.

doi:10.1007/s10772-020-09717-8

DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit

Published: 20 May 2020

Volume 24, pages 41–45, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Jyoti Guglani¹ &
A. N. Mishra²

464 Accesses
11 Citations
Explore all metrics

Abstract

This paper demonstrates the effect of incorporating Deep Neural Network techniques in speech recognition systems. Speech recognition through hybrid Deep Neural Networks on the Kaldi toolkit for the Punjabi language is implemented. Performance of the automatic speech recognition system drastically improves using DNN, and further Karel's DNN model gives better recognition performance as compared to Dan's DNN model. Out of MFCC and PLP features, the MFCC feature gives better results. The triphone model gives a lower word error rate than the monophone model, and 3-g gives a lower word error rate as compared to a 2-g model on the Kaldi toolkit for the continuous Punjabi speech recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of Digit Recognition in Kannada Using Kaldi Toolkit

Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Article 16 February 2018

References

Badino, L., Canevari, C., Fadiga, L., & Metta, G. (2016). Integrating articulatory data in deep neural network-based acoustic modeling. Computer Speech & Language, 36, 173–195. https://doi.org/10.1016/j.csl.2015.05.005.
Article Google Scholar
Cosi, P. (n.d.). Phone Recognition Experiments on ArtiPhon with KALDI. EVALITA. Evaluation of NLP and Speech Tools for Italian (pp. 26–31). https://doi.org/10.4000/books.aaccademia.1932
Cosi, P. (2015). A KALDI-DNN-based ASR system for Italian. In 2015 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2015.7280336
Dahl, G. E., Dong, Yu, Deng, Li, & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42. https://doi.org/10.1109/tasl.2011.2134090.
Article Google Scholar
Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012). https://doi.org/10.1049/cp.2012.2528
Erdogan, H., Hershey, J. R., Watanabe, S., & Le Roux, J. (2015). Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2015.7178061.
Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28. https://doi.org/10.5120/12563-9002.
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., et al. (2012a). Deep neural networks for acoustic modeling in speech recognition: The shared views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/msp.2012.2205597.
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., et al. (2012b). Deep neural networks for acoustic modeling in speech recognition: The shared views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/msp.2012.2205597.
Article Google Scholar
Horndasch, A., Kaufhold, C., & Nöth, E. (2016). How to add word classes to the Kaldi Speech Recognition Toolkit. Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-45510-5_56.
Article Google Scholar
Meftah, A. H., Alotaibi, Y. A., & Selouani, S.-A. (2018). Evaluation of an Arabic Speech Corpus of emotions: A perceptual and statistical analysis. IEEE Access, 6, 72845–72861. https://doi.org/10.1109/access.2018.2881096.
Article Google Scholar
Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., et al. (2016). Purely sequence-trained neural networks for ASR based on lattice-free MMI. Interspeech. https://doi.org/10.21437/interspeech.2016-595.
Article Google Scholar
Seide, F., Li, G., Chen, X., & Yu, D. (2011). Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription. In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding. https://doi.org/10.1109/asru.2011.6163899
Sigtia, S., & Dixon, S. (2014). Improved music feature learning with deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2014.6854949
Tang, H., Hasegawa-Johnson, M., & Huang, T. S. (2010). Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/icassp.2010.5494989
Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, Y. V. (2017). Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). https://doi.org/10.1109/wispnet.2017.8299868
Vesely, K., Hannemann, M., & Burget, L. (2013). Semi-supervised training of Deep Neural Networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. https://doi.org/10.1109/asru.2013.6707741
Vu, N. T., Imseng, D., Povey, D., Motlicek, P., Schultz, T., & Bourlard, H. (2014). Multilingual deep neural network based acoustic modeling for rapid language adaptation. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2014.6855086
Wong, J. H. M., & Gales, M. J. F. (2016). Sequence student-teacher training of deep neural networks. Interspeech. https://doi.org/10.21437/interspeech.2016-911.
Article Google Scholar
Woodland, P. C., Gales, M. J. F., Pye, D., & Young, S. J. (n.d.). Broadcast news transcription using HTK. 1997
Xie, Y., Le, L., Zhou, Y., & Raghavan, V. V. (2018). Deep learning for natural language processing. Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications. https://doi.org/10.1016/bs.host.2018.05.001.
Article MATH Google Scholar
Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2014.6853589

Download references

Author information

Authors and Affiliations

Department of Electronics & Communication, IMSEC, Ghaziabad, India
Jyoti Guglani
Department of Electronics & Communication, Krishna Engineering College, Ghaziabad, India
A. N. Mishra

Authors

Jyoti Guglani
View author publications
You can also search for this author in PubMed Google Scholar
A. N. Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jyoti Guglani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guglani, J., Mishra, A.N. DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit. Int J Speech Technol 24, 41–45 (2021). https://doi.org/10.1007/s10772-020-09717-8

Download citation

Received: 20 July 2019
Accepted: 12 May 2020
Published: 20 May 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10772-020-09717-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit

Abstract

Access this article

Similar content being viewed by others

Analysis of Digit Recognition in Kannada Using Kaldi Toolkit

Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit

Abstract

Access this article

Similar content being viewed by others

Analysis of Digit Recognition in Kannada Using Kaldi Toolkit

Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation