Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method

Khysru, Kuntharrgyal; Qie, Yangzhuoma; Shi, Haiqiang; Sun, Qilong; Wei, Jianguo

doi:10.1007/978-3-031-06788-4_13

Kuntharrgyal Khysru¹¹,
Yangzhuoma Qie¹¹,
Haiqiang Shi¹²,
Qilong Sun¹¹ &
…
Jianguo Wei^11,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13339))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1012 Accesses

Abstract

Time-delay neural networks (TDNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic modeling using TDNN method based on several different phoneme sets, which are defined based on linguistic and phonological knowledge of Tibetan Lhasa dialect. Experiments are conducted on a Tibetan corpus recorded by 20 persons, using a bigram language model over phones. The phone error rate (PER) results show that the acoustic model with CTL set performs best, which is relatively 10.43% higher accuracy than the basic phoneme set. Moreover, our results confirm the fact that for Lhasa Tibetan acoustic model, the paradigm TDNN-HMM outperforms the conventional GMM-HMM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, X., Wang, B., Wu, Q., Xu, Y.: Prosodic realization of focus in statement and question in Tibetan (Lhasa Dialect). In: Interspeech, pp. 667–670, September 2012
Google Scholar
Li, G., Yu, H.: Large-vocabulary continuous speech recognition of Lhasa Tibetan. Appl. Mech. Mater. 519 (2014)
Google Scholar
Zhao, Y., Cao, Y., Pan, X.: Tibetan language continuous speech recognition based on dynamic Bayesian network. In: Fifth International Conference on Natural Computation (2009)
Google Scholar
Li, J., Mohamed, A., Zweig, G., Gong, Y.: Exploring multidimensional LSTMS for large vocabulary ASR. In: Proceedings of ICASSP, pp. 4940–4944 (2016)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. (1), 30–42 (2012)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 82–97 (2012)
Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440 (2011)
Google Scholar
Yu, D., Seltzer, M., Li, J., Huang, J., Seide, F.: Feature learning in deep neural networks-studies on speech recognition tasks, arXiv preprint arXiv, 1301-3605 (2013)
Google Scholar
Zhao, Y., Cao, Y., Pan, X., Xu, X.: Tibetan language continuous speech recognition based on active WS-DBN. In: IEEE International Conference on Automation and Logistics, pp. 1558–1562 (2009)
Google Scholar
Shrivastava, A., Kundu, A., Dhir, C., et al.: Optimize what matters: training DNN-Hmm keyword spotting model using end metric. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 4000–4004 (2021)
Google Scholar
Psutka, J.V., Vaněk, J., Pražák, A.: Various DNN-HMM architectures used in acoustic modeling with single-speaker and single-channel. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds.) SLSP 2021. LNCS (LNAI), vol. 13062, pp. 85–96. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89579-2_8
Chapter Google Scholar
Fahad, M.S., Deepak, A., Pradhan, G., et al.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 40(1), 466–489 (2021)
Article Google Scholar
Laskar, M.A., Laskar, R.H.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 38(8), 3548–3572 (2019)
Article Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU, pp. 1–4 (2011)
Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Rath, S.P., Povey, D., Veselý, K., Cernocký, J.: Improved feature processing for deep neural networks. In: Interspeech (2013)
Google Scholar
AlFutamani, A.A., Al-Baity, H.H.: Emotional analysis of Arabic Saudi dialect tweets using a supervised learning approach. Intell. Autom. Soft Comput. 29(1), 89–109 (2021)
Article Google Scholar
Zhao, Y., et al.: Tibetan multi-dialect speech recognition using latent regression Bayesian network and end-to-end mode. J. Internet Things 1(1), 17–23 (2019)
Article Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., et al.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Sig. Process. 37(3), 328–339 (1989)
Article Google Scholar
Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J.: DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst. Sig. Process. 40(1), 466–489 (2021)
Article Google Scholar
Jyoshna, G., Zia, M., Koteswararao, L.: An efficient reference free adaptive learning process for speech enhancement applications. Comput. Mater. Continua 3067–3080 (2022)
Google Scholar
Lee, D., Park, H., Seo, S., Kim, C., Son, H., et al.: Language model using differentiable neural computer based on forget gate-based memory deallocation. Comput. Mater. Continua 537–551 (2021)
Google Scholar
Zhang, X.R., Zhang, W.F., Sun, W., Sun, X.M., Jha, S.K.: A robust 3-D medical watermarking based on wavelet transform for data protection. Comput. Syst. Sci. Eng. 41(3), 1043–1056 (2022)
Article Google Scholar
Zhang, X.R., Sun, X., Sun, X.M., Sun, W., Jha, S.K.: Robust reversible audio watermarking scheme for telemedicine and privacy protection. Comput. Mater. Continua 71(2), 3035–3050 (2022)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Regional Innovation Cooperation Project of Sichuan Province(Grant No. 22QYCX0082).

Author information

Authors and Affiliations

Key Laboratory of Artificial Intelligence Application Technology State Ethnic Affairs Commission, Qinghai Minzu University, Xining, 810007, China
Kuntharrgyal Khysru, Yangzhuoma Qie, Qilong Sun & Jianguo Wei
Institute of “TWO-BOMBS&ONE-SATELLITE” Ideals and Beliefs, Beijing, 810299, China
Haiqiang Shi
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China
Jianguo Wei

Authors

Kuntharrgyal Khysru
View author publications
You can also search for this author in PubMed Google Scholar
Yangzhuoma Qie
View author publications
You can also search for this author in PubMed Google Scholar
Haiqiang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Qilong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuntharrgyal Khysru .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Xiaorui Zhang
Jinan University, Guangzhou, China
Zhihua Xia
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khysru, K., Qie, Y., Shi, H., Sun, Q., Wei, J. (2022). Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-06788-4_13
Published: 04 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06787-7
Online ISBN: 978-3-031-06788-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method