Abstract
Time-delay neural networks (TDNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic modeling using TDNN method based on several different phoneme sets, which are defined based on linguistic and phonological knowledge of Tibetan Lhasa dialect. Experiments are conducted on a Tibetan corpus recorded by 20 persons, using a bigram language model over phones. The phone error rate (PER) results show that the acoustic model with CTL set performs best, which is relatively 10.43% higher accuracy than the basic phoneme set. Moreover, our results confirm the fact that for Lhasa Tibetan acoustic model, the paradigm TDNN-HMM outperforms the conventional GMM-HMM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, X., Wang, B., Wu, Q., Xu, Y.: Prosodic realization of focus in statement and question in Tibetan (Lhasa Dialect). In: Interspeech, pp. 667–670, September 2012
Li, G., Yu, H.: Large-vocabulary continuous speech recognition of Lhasa Tibetan. Appl. Mech. Mater. 519 (2014)
Zhao, Y., Cao, Y., Pan, X.: Tibetan language continuous speech recognition based on dynamic Bayesian network. In: Fifth International Conference on Natural Computation (2009)
Li, J., Mohamed, A., Zweig, G., Gong, Y.: Exploring multidimensional LSTMS for large vocabulary ASR. In: Proceedings of ICASSP, pp. 4940–4944 (2016)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. (1), 30–42 (2012)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 82–97 (2012)
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440 (2011)
Yu, D., Seltzer, M., Li, J., Huang, J., Seide, F.: Feature learning in deep neural networks-studies on speech recognition tasks, arXiv preprint arXiv, 1301-3605 (2013)
Zhao, Y., Cao, Y., Pan, X., Xu, X.: Tibetan language continuous speech recognition based on active WS-DBN. In: IEEE International Conference on Automation and Logistics, pp. 1558–1562 (2009)
Shrivastava, A., Kundu, A., Dhir, C., et al.: Optimize what matters: training DNN-Hmm keyword spotting model using end metric. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 4000–4004 (2021)
Psutka, J.V., Vaněk, J., Pražák, A.: Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds.) SLSP 2021. LNCS (LNAI), vol. 13062, pp. 85–96. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89579-2_8
Fahad, M.S., Deepak, A., Pradhan, G., et al.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 40(1), 466–489 (2021)
Laskar, M.A., Laskar, R.H.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 38(8), 3548–3572 (2019)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU, pp. 1–4 (2011)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Rath, S.P., Povey, D., Veselý, K., Cernocký, J.: Improved feature processing for deep neural networks. In: Interspeech (2013)
AlFutamani, A.A., Al-Baity, H.H.: Emotional analysis of Arabic Saudi dialect tweets using a supervised learning approach. Intell. Autom. Soft Comput. 29(1), 89–109 (2021)
Zhao, Y., et al.: Tibetan multi-dialect speech recognition using latent regression Bayesian network and end-to-end mode. J. Internet Things 1(1), 17–23 (2019)
Waibel, A., Hanazawa, T., Hinton, G., et al.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Sig. Process. 37(3), 328–339 (1989)
Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 40(1), 466–489 (2021)
Jyoshna, G., Zia, M., Koteswararao, L.: An efficient reference free adaptive learning process for speech enhancement applications. Comput. Mater. Continua 3067–3080 (2022)
Lee, D., Park, H., Seo, S., Kim, C., Son, H., et al.: Language model using differentiable neural computer based on forget gate-based memory deallocation. Comput. Mater. Continua 537–551 (2021)
Zhang, X.R., Zhang, W.F., Sun, W., Sun, X.M., Jha, S.K.: A robust 3-D medical watermarking based on wavelet transform for data protection. Comput. Syst. Sci. Eng. 41(3), 1043–1056 (2022)
Zhang, X.R., Sun, X., Sun, X.M., Sun, W., Jha, S.K.: Robust reversible audio watermarking scheme for telemedicine and privacy protection. Comput. Mater. Continua 71(2), 3035–3050 (2022)
Acknowledgements
This work was supported by the Regional Innovation Cooperation Project of Sichuan Province(Grant No. 22QYCX0082).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khysru, K., Qie, Y., Shi, H., Sun, Q., Wei, J. (2022). Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-06788-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06787-7
Online ISBN: 978-3-031-06788-4
eBook Packages: Computer ScienceComputer Science (R0)