Speech Recognition Model for Confused Thai Lanna Vocabulary Using Deep Learning Techniques

Nuankaew, Wongpanya S.; Jomsawan, Pathapol; Sararat, Thapanapong; Nuankaew, Pratya

doi:10.1007/978-981-96-0695-5_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15432))

Included in the following conference series:

International Conference on Multi-disciplinary Trends in Artificial Intelligence

105 Accesses

Abstract

This research investigates the development approach to developing a speech recognition model for Thai Lanna vocabulary, explicitly addressing the challenges of confused vocabulary items in the Kham Mueang dialect. The research aims to create a robust system capable of accurately recognizing and interpreting the nuances of Northern Thai speech by utilizing deep learning techniques, particularly speech transformer models. The research is based on a unique Confused Thai Lanna Vocabulary dataset, meticulously collected between 2023 and 2024. It comprises authentic recordings from native speakers and fluent communicators. The speech recognition models - HuBERT, Wav2Vec2-TH, Wav2Vec2, and WavLM - were employed to enhance the recognition and interpretation of the Confused Thai Lanna Vocabulary. Results indicate that all models effectively adapt to the Lanna language, with HuBERT consistently outperforming across all metrics. Wav2Vec2-TH demonstrated the most balanced performance across different word categories, while WavLM excelled in recognizing training words. However, significant challenges remain in processing new words and accent variants, highlighting areas for future research. This study advances ASR technology for regional dialects and contributes to preserving linguistic diversity and promoting inclusive communication in Northern Thailand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Suwanbandit, A., Naowarat, B., Sangpetch, O., Chuangsuwanich, E.: Thai dialect corpus and transfer-based curriculum learning investigation for dialect automatic speech recognition. Presented at the Proc. Interspeech 2023 (2023). https://doi.org/10.21437/Interspeech.2023-1828
Suwanbandit, A., et al.: Thai-dialect: low resource thai dialectal speech to text corpora. In: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1–8 (2023). https://doi.org/10.1109/ASRU57964.2023.10389792
Kheddar, H., Hemis, M., Himeur, Y.: Automatic speech recognition using advanced deep learning approaches: a survey. Inf. Fusion. 109, 102422 (2024). https://doi.org/10.1016/j.inffus.2024.102422
Article Google Scholar
Dhanjal, A.S., Singh, W.: A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl. 83, 23367–23412 (2024). https://doi.org/10.1007/s11042-023-16438-y
Article Google Scholar
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. http://arxiv.org/abs/2006.11477, (2020). https://doi.org/10.48550/arXiv.2006.11477
Kantithammakorn, P., Punyabukkana, P., Pratanwanich, P.N., Hemrungrojn, S., Chunharas, C., Wanvarie, D.: Using automatic speech recognition to assess thai speech language fluency in the montreal cognitive assessment (MoCA). Sensors. 22, 1583 (2022). https://doi.org/10.3390/s22041583
Article Google Scholar
Muangjaroen, S., Udomsiri, S.: Continuous speech commands recognition with thai language used support vector machine technique: a case study of speech commands control for mobile robots. Przegląd Elektrotechniczny. R. 99(6) (2023). https://doi.org/10.15199/48.2023.06.37
Chao-angthong, P., Suchato, A., Punyabukkana, P.: northern thai dialect text to speech. In: 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6 (2017). https://doi.org/10.1109/JCSSE.2017.8025905
Deng, L.: Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Signal Inf. Process. 5, e1 (2016). https://doi.org/10.1017/ATSIP.2015.22
Article Google Scholar
Wimonkasem, K.: Northern Thai dialect. http://www.thapra.lib.su.ac.th/e-book/northern_thai/. Accessed 8 Aug 2024
facebook/wav2vec2-large-xlsr-53 · Hugging Face. https://huggingface.co/facebook/wav2vec2-large-xlsr-53. Accessed 14 Aug 2024
airesearch/wav2vec2-large-xlsr-53-th · Hugging Face. https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th. Accessed 14 Aug 2024
facebook/hubert-large-ls960-ft · Hugging Face. https://huggingface.co/facebook/hubert-large-ls960-ft. Accessed 14 Aug 2024
microsoft/wavlm-large · Hugging Face. https://huggingface.co/microsoft/wavlm-large. Accessed 14 Aug 2024
Maekaku, T., Chang, X., Fujita, Y., Watanabe, S.: An exploration of hubert with large number of cluster units and model assessment using bayesian information criterion. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7107–7111 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746097
Chen, S., et al.: WavLM: large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process. 16, 1505–1518 (2022). https://doi.org/10.1109/JSTSP.2022.3188113
Article Google Scholar

Download references

Acknowledgement

This research project was supported by the Thailand Science Research and Innovation Fund and the University of Phayao. It also received support from many advisors, academics, researchers, students, and staff. The authors thank everyone for their support and cooperation in completing this research.

Moreover, thanks to Claude, ChatGPT, Sci Space, Gemini, and Perplexity Generative AI for their invaluable assistance in gathering information, reading, and providing guidance throughout this research.

Author information

Authors and Affiliations

School of Information and Communication Technology, University of Phayao, Phayao, 56000, Thailand
Wongpanya S. Nuankaew, Pathapol Jomsawan, Thapanapong Sararat & Pratya Nuankaew

Authors

Wongpanya S. Nuankaew
View author publications
You can also search for this author in PubMed Google Scholar
Pathapol Jomsawan
View author publications
You can also search for this author in PubMed Google Scholar
Thapanapong Sararat
View author publications
You can also search for this author in PubMed Google Scholar
Pratya Nuankaew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pratya Nuankaew .

Editor information

Editors and Affiliations

Mahasarakham University, Mahasarakham, Thailand
Chattrakul Sombattheera
Duke Kunshan University, Kunshan, China
Paul Weng
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jun Pang

Ethics declarations

Conflict of Interest

The researchers declare that there is no conflict of interest for this research.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nuankaew, W.S., Jomsawan, P., Sararat, T., Nuankaew, P. (2025). Speech Recognition Model for Confused Thai Lanna Vocabulary Using Deep Learning Techniques. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15432. Springer, Singapore. https://doi.org/10.1007/978-981-96-0695-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-96-0695-5_8
Published: 20 February 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0694-8
Online ISBN: 978-981-96-0695-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics