Skip to main content

Speech Recognition Model for Confused Thai Lanna Vocabulary Using Deep Learning Techniques

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2024)

Abstract

This research investigates the development approach to developing a speech recognition model for Thai Lanna vocabulary, explicitly addressing the challenges of confused vocabulary items in the Kham Mueang dialect. The research aims to create a robust system capable of accurately recognizing and interpreting the nuances of Northern Thai speech by utilizing deep learning techniques, particularly speech transformer models. The research is based on a unique Confused Thai Lanna Vocabulary dataset, meticulously collected between 2023 and 2024. It comprises authentic recordings from native speakers and fluent communicators. The speech recognition models - HuBERT, Wav2Vec2-TH, Wav2Vec2, and WavLM - were employed to enhance the recognition and interpretation of the Confused Thai Lanna Vocabulary. Results indicate that all models effectively adapt to the Lanna language, with HuBERT consistently outperforming across all metrics. Wav2Vec2-TH demonstrated the most balanced performance across different word categories, while WavLM excelled in recognizing training words. However, significant challenges remain in processing new words and accent variants, highlighting areas for future research. This study advances ASR technology for regional dialects and contributes to preserving linguistic diversity and promoting inclusive communication in Northern Thailand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Suwanbandit, A., Naowarat, B., Sangpetch, O., Chuangsuwanich, E.: Thai dialect corpus and transfer-based curriculum learning investigation for dialect automatic speech recognition. Presented at the Proc. Interspeech 2023 (2023). https://doi.org/10.21437/Interspeech.2023-1828

  2. Suwanbandit, A., et al.: Thai-dialect: low resource thai dialectal speech to text corpora. In: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1–8 (2023). https://doi.org/10.1109/ASRU57964.2023.10389792

  3. Kheddar, H., Hemis, M., Himeur, Y.: Automatic speech recognition using advanced deep learning approaches: a survey. Inf. Fusion. 109, 102422 (2024). https://doi.org/10.1016/j.inffus.2024.102422

    Article  Google Scholar 

  4. Dhanjal, A.S., Singh, W.: A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl. 83, 23367–23412 (2024). https://doi.org/10.1007/s11042-023-16438-y

    Article  Google Scholar 

  5. Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. http://arxiv.org/abs/2006.11477, (2020). https://doi.org/10.48550/arXiv.2006.11477

  6. Kantithammakorn, P., Punyabukkana, P., Pratanwanich, P.N., Hemrungrojn, S., Chunharas, C., Wanvarie, D.: Using automatic speech recognition to assess thai speech language fluency in the montreal cognitive assessment (MoCA). Sensors. 22, 1583 (2022). https://doi.org/10.3390/s22041583

    Article  Google Scholar 

  7. Muangjaroen, S., Udomsiri, S.: Continuous speech commands recognition with thai language used support vector machine technique: a case study of speech commands control for mobile robots. PrzeglÄ…d Elektrotechniczny. R. 99(6) (2023). https://doi.org/10.15199/48.2023.06.37

  8. Chao-angthong, P., Suchato, A., Punyabukkana, P.: northern thai dialect text to speech. In: 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6 (2017). https://doi.org/10.1109/JCSSE.2017.8025905

  9. Deng, L.: Deep learning: from speech recognition to language and multimodal processing. APSIPA Trans. Signal Inf. Process. 5, e1 (2016). https://doi.org/10.1017/ATSIP.2015.22

    Article  Google Scholar 

  10. Wimonkasem, K.: Northern Thai dialect. http://www.thapra.lib.su.ac.th/e-book/northern_thai/. Accessed 8 Aug 2024

  11. facebook/wav2vec2-large-xlsr-53 · Hugging Face. https://huggingface.co/facebook/wav2vec2-large-xlsr-53. Accessed 14 Aug 2024

  12. airesearch/wav2vec2-large-xlsr-53-th · Hugging Face. https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th. Accessed 14 Aug 2024

  13. facebook/hubert-large-ls960-ft · Hugging Face. https://huggingface.co/facebook/hubert-large-ls960-ft. Accessed 14 Aug 2024

  14. microsoft/wavlm-large · Hugging Face. https://huggingface.co/microsoft/wavlm-large. Accessed 14 Aug 2024

  15. Maekaku, T., Chang, X., Fujita, Y., Watanabe, S.: An exploration of hubert with large number of cluster units and model assessment using bayesian information criterion. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7107–7111 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746097

  16. Chen, S., et al.: WavLM: large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process. 16, 1505–1518 (2022). https://doi.org/10.1109/JSTSP.2022.3188113

    Article  Google Scholar 

Download references

Acknowledgement

This research project was supported by the Thailand Science Research and Innovation Fund and the University of Phayao. It also received support from many advisors, academics, researchers, students, and staff. The authors thank everyone for their support and cooperation in completing this research.

Moreover, thanks to Claude, ChatGPT, Sci Space, Gemini, and Perplexity Generative AI for their invaluable assistance in gathering information, reading, and providing guidance throughout this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pratya Nuankaew .

Editor information

Editors and Affiliations

Ethics declarations

Conflict of Interest

The researchers declare that there is no conflict of interest for this research.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nuankaew, W.S., Jomsawan, P., Sararat, T., Nuankaew, P. (2025). Speech Recognition Model for Confused Thai Lanna Vocabulary Using Deep Learning Techniques. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15432. Springer, Singapore. https://doi.org/10.1007/978-981-96-0695-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0695-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0694-8

  • Online ISBN: 978-981-96-0695-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics