A comparative study of deep neural network based Punjabi-ASR system

Kadyan, Virender; Mantri, Archana; Aggarwal, R. K.; Singh, Amitoj

doi:10.1007/s10772-018-09577-3

A comparative study of deep neural network based Punjabi-ASR system

Published: 15 December 2018

Volume 22, pages 111–119, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Virender Kadyan¹,
Archana Mantri²,
R. K. Aggarwal³ &
…
Amitoj Singh⁴

404 Accesses
16 Citations
Explore all metrics

Abstract

HMM is regarded as the leader from last five decades for handling the temporal variability in an input speech signal for building automatic speech recognition system. GMM became an integral part of HMM so as to measure the efficiency of each state that stores the information of a short windowed frame. In order to systematically fit the frame, it reserves the frame coefficients and connects their posterior probability over HMM state that acts as an output. In this paper, deep neural network (DNN) is tested against the GMM through utilization of many hidden layers which helps the DNN to successfully evade the issue of overfitting on large training dataset before its performance becomes worse. The implementation DNN with robust feature extraction approach has brought a high performance margin in Punjabi speech recognition system. For feature extraction, the baseline MFCC and GFCC approaches are integrated with cepstral mean and variance normalization. The dimension reduction, decorrelation of vector information and speaker variability is later addressed with linear discriminant analysis, maximum likelihood linear transformation, SAT, maximum likelihood linear regression adaptation models. Two hybrid classifiers investigate the conceived acoustic feature vectors: GMM–HMM, and DNN–HMM to obtain improvement in performance on connected and continuous Punjabi speech corpus. Experimental setup shows a notable improvement of 4–5% and 1–3% (in connected and continuous datasets respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Yogesh Kumar, Apeksha Koul & Chamkaur Singh

References

Acero, A., & Stern, R. M. (1992). Cepstral normalization for robust speech recognition. In Speech processing in adverse conditions.
Bourlard, H., & Morgan, N. (1993). Connectionist speech recognition. A hybrid approach (Vol. 247). Boston: The Kluwer International Series in Engineering and Computer Science.
Google Scholar
Chen, X., & Cheng, J. (2014). Deep neural network acoustic modeling for native and non-native Mandarin speech recognition. In Proceedings of ISCSLP (pp. 6–9).
Dua, M., Aggarwal, R. K., & Biswas, M. (2018). GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0828-x.
Google Scholar
Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364.
Google Scholar
Gales, M., & Woodland, P. (1996a). Mean and variance adaptation within the MLLR framework. Computer Speech & Language, 10, 249–264.
Article Google Scholar
Gales, M. J. (1998). Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language, 12(2), 75–98.
Article Google Scholar
Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 422–431.
Article Google Scholar
Haeb-Umbach, R., & Ney, H. (1992). Linear discriminant analysis for improved large vocabulary continuous speech recognition. In Acoustics, speech, and signal processing, 1992. ICASSP-92., 1992 IEEE international conference on (Vol. 1, pp. 13–16). IEEE.
Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In Acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings. 2000 IEEE international conference on (Vol. 3, pp. 1635–1638). IEEE.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Juang, B. H., Levinson, S., & Sondhi, M. (1986). Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Transactions on Information Theory, 32(2), 307–309.
Article Google Scholar
Kadyan, V., Mantri, A., & Aggarwal, R. K. (2017) Refinement of HMM model parameters for Punjabi Automatic Speech Recognition (PASR) System, IETE Journal of Research. https://doi.org/10.1080/03772063.2017.1369370.
Google Scholar
Kadyan, V., Mantri, V., & Aggarwal, R. K. (2017) Refinement of HMM model parameters for Punjabi Automatic Speech Recognition (PASR) System. IETE Journal of Research. https://doi.org/10.1080/03772063.2017.1369370.
Google Scholar
Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication, 26(4), 283–297.
Article Google Scholar
Kumar, Y., & Singh, N. (2017). An automatic speech recognition system for spontaneous Punjabi speech corpus. International Journal of Speech Technology, 20(2), 297–303.
Article Google Scholar
Lata, S., & Arora, S. (2013) Laryngeal tonal characteristics of Punjabi—An experimental study. In 2015 2nd international conference on computing for sustainable global development (pp. 1694–1697).
Liu, F., Stern, R. M., Huang, X., & Acero, R. (1993). Efficient cepstral normalization for robust speech recognition. In Proceedings of the workshop on human language technology (pp. 69–74).
Matsoukas, S., Schwartz, R., Jin, H., & Nguyen, L. (1997). Practical implementations of speaker-adaptive training. In DARPA speech recognition workshop.
Mittal, S., & Sharma, R. K. (2014). Development of phonetic engine for Punjabi language (Doctoral dissertation), Thapar University, Patiala, India.
Mitra, V., Wang, W., Franco, H., Lei, Y., Bartels, C., & Graciarena, M. (2014). Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions. In Fifteenth annual conference of the international speech communication association.
Palaz, D., & Collobert, R. (2015). Analysis of cnn-based speech recognition system using raw speech as input (No. EPFL-REPORT-210039). Idiap.
Parthasarathi, S. H. K., Hoffmeister, B., Matsoukas, S., Mandal, A., Strom, N., & Garimella, S. (2015). fMLLR based feature-space speaker adaptation of DNN acoustic models. In Sixteenth annual conference of the international speech communication association.
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., & Rose, R. C. (2011). The subspace Gaussian mixture model—A structured model for speech recognition. Computer Speech & Language, 25(2), 404–439.
Article Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall Inc.
Google Scholar
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Network, 61, 85–117.
Article Google Scholar
Singh, A., Dipti, P., & Agrawal, S. S. (2015) Analysis of Punjabi tonemes. In Computing for Sustainable Global Development (INDIACom) (pp. 1–6).
Sivasankaran, S., Nugraha, A. A., Vincent, E., Morales-Cordovilla, J. A., Dalmia, S., Illina, I., et al. (2015). Robust ASR using neural network based speech enhancement and feature simulation. In IEEE workshop on automatic speech recognition and understanding (ASRU), 2015 (pp. 482–489).

Download references

Acknowledgements

This work is partially tested on the sample Punjabi corpus collected for Language Resources for Auditory impaired Person project from IEEE SIGHT. The views and results in the work is as per perspective of the research. The author would like to thank Speech and Multimodel Laboratory members Mandeep, Sashi, Nikhil at Chitkara University Punjab. Special thanks to Dr. Syed who helps in providing valuable input in formation of a baseline DNN system.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, India
Virender Kadyan
Department of Electronics & Communication Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, India
Archana Mantri
Department of Computer Engineering, N.I.T. Kurukshetra, Kurukshetra, Haryana, India
R. K. Aggarwal
Department of Computer Application, M.R.S. P.T.U, Bathinda, Punjab, India
Amitoj Singh

Authors

Virender Kadyan
View author publications
You can also search for this author in PubMed Google Scholar
Archana Mantri
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Amitoj Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amitoj Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kadyan, V., Mantri, A., Aggarwal, R.K. et al. A comparative study of deep neural network based Punjabi-ASR system. Int J Speech Technol 22, 111–119 (2019). https://doi.org/10.1007/s10772-018-09577-3

Download citation

Received: 21 January 2018
Accepted: 27 November 2018
Published: 15 December 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09577-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A comparative study of deep neural network based Punjabi-ASR system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative study of deep neural network based Punjabi-ASR system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation