Abstract
This research investigates the development approach to developing a speech recognition model for Thai Lanna vocabulary, explicitly addressing the challenges of confused vocabulary items in the Kham Mueang dialect. The research aims to create a robust system capable of accurately recognizing and interpreting the nuances of Northern Thai speech by utilizing deep learning techniques, particularly speech transformer models. The research is based on a unique Confused Thai Lanna Vocabulary dataset, meticulously collected between 2023 and 2024. It comprises authentic recordings from native speakers and fluent communicators. The speech recognition models - HuBERT, Wav2Vec2-TH, Wav2Vec2, and WavLM - were employed to enhance the recognition and interpretation of the Confused Thai Lanna Vocabulary. Results indicate that all models effectively adapt to the Lanna language, with HuBERT consistently outperforming across all metrics. Wav2Vec2-TH demonstrated the most balanced performance across different word categories, while WavLM excelled in recognizing training words. However, significant challenges remain in processing new words and accent variants, highlighting areas for future research. This study advances ASR technology for regional dialects and contributes to preserving linguistic diversity and promoting inclusive communication in Northern Thailand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Suwanbandit, A., Naowarat, B., Sangpetch, O., Chuangsuwanich, E.: Thai dialect corpus and transfer-based curriculum learning investigation for dialect automatic speech recognition. Presented at the Proc. Interspeech 2023 (2023). https://doi.org/10.21437/Interspeech.2023-1828
Suwanbandit, A., et al.: Thai-dialect: low resource thai dialectal speech to text corpora. In: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1–8 (2023). https://doi.org/10.1109/ASRU57964.2023.10389792
Kheddar, H., Hemis, M., Himeur, Y.: Automatic speech recognition using advanced deep learning approaches: a survey. Inf. Fusion. 109, 102422 (2024). https://doi.org/10.1016/j.inffus.2024.102422
Dhanjal, A.S., Singh, W.: A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl. 83, 23367–23412 (2024). https://doi.org/10.1007/s11042-023-16438-y
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. http://arxiv.org/abs/2006.11477, (2020). https://doi.org/10.48550/arXiv.2006.11477
Kantithammakorn, P., Punyabukkana, P., Pratanwanich, P.N., Hemrungrojn, S., Chunharas, C., Wanvarie, D.: Using automatic speech recognition to assess thai speech language fluency in the montreal cognitive assessment (MoCA). Sensors. 22, 1583 (2022). https://doi.org/10.3390/s22041583
Muangjaroen, S., Udomsiri, S.: Continuous speech commands recognition with thai language used support vector machine technique: a case study of speech commands control for mobile robots. PrzeglÄ…d Elektrotechniczny. R. 99(6) (2023). https://doi.org/10.15199/48.2023.06.37
Chao-angthong, P., Suchato, A., Punyabukkana, P.: northern thai dialect text to speech. In: 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6 (2017). https://doi.org/10.1109/JCSSE.2017.8025905
Deng, L.: Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Signal Inf. Process. 5, e1 (2016). https://doi.org/10.1017/ATSIP.2015.22
Wimonkasem, K.: Northern Thai dialect. http://www.thapra.lib.su.ac.th/e-book/northern_thai/. Accessed 8 Aug 2024
facebook/wav2vec2-large-xlsr-53 · Hugging Face. https://huggingface.co/facebook/wav2vec2-large-xlsr-53. Accessed 14 Aug 2024
airesearch/wav2vec2-large-xlsr-53-th · Hugging Face. https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th. Accessed 14 Aug 2024
facebook/hubert-large-ls960-ft · Hugging Face. https://huggingface.co/facebook/hubert-large-ls960-ft. Accessed 14 Aug 2024
microsoft/wavlm-large · Hugging Face. https://huggingface.co/microsoft/wavlm-large. Accessed 14 Aug 2024
Maekaku, T., Chang, X., Fujita, Y., Watanabe, S.: An exploration of hubert with large number of cluster units and model assessment using bayesian information criterion. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7107–7111 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746097
Chen, S., et al.: WavLM: large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process. 16, 1505–1518 (2022). https://doi.org/10.1109/JSTSP.2022.3188113
Acknowledgement
This research project was supported by the Thailand Science Research and Innovation Fund and the University of Phayao. It also received support from many advisors, academics, researchers, students, and staff. The authors thank everyone for their support and cooperation in completing this research.
Moreover, thanks to Claude, ChatGPT, Sci Space, Gemini, and Perplexity Generative AI for their invaluable assistance in gathering information, reading, and providing guidance throughout this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Conflict of Interest
The researchers declare that there is no conflict of interest for this research.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nuankaew, W.S., Jomsawan, P., Sararat, T., Nuankaew, P. (2025). Speech Recognition Model for Confused Thai Lanna Vocabulary Using Deep Learning Techniques. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15432. Springer, Singapore. https://doi.org/10.1007/978-981-96-0695-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-96-0695-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0694-8
Online ISBN: 978-981-96-0695-5
eBook Packages: Computer ScienceComputer Science (R0)