The System for Detecting Vietnamese Mispronunciation

Minh, Nguyen Quang; Hung, Phan Duy

doi:10.1007/978-981-16-8062-5_32

Nguyen Quang Minh⁹ &
Phan Duy Hung⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1500))

Included in the following conference series:

International Conference on Future Data and Security Engineering

1197 Accesses
3 Citations

Abstract

The deepening international integration of Vietnam is open the occupation opportunity for more and more foreigners. The demand for learning Vietnamese, therefore, is rising. However, Vietnamese can be difficult to pronounce for some who just getting started due to its complex vowels and tone marks. A Computer-Assisted Pronunciation Training system can significantly improve the capability of pronouncing for learners. In particular, a mispronunciation model plays an important role in this type of system. While there is still a limited number of researches for Vietnamese, this research introduces a process to build the new mispronunciation detection for the language. Besides, comparing some of the current advanced techniques in this task such as Goodness of Pronunciation for scoring and deep neural network for classifying, a process to build the Vietnamese vowels - tones test set is also introduced to evaluate the performance of the system. The best performance of the proposed model achieved a 0.54 F1 score and 0.46 PCC score on the built test set. This is a reasonable result, compare with a 0.45 PCC score on another English test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Commun. 95–108 (2000)
Google Scholar
Hu, W., Qian, Y., Soong, F., Wang, Y.: Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Commun. 154–166 (2015)
Google Scholar
Pham, N.M., Vu, H.Q.: Acceleration in state of the art ASR applied to a Vietnamese transcription system. J. Comput. Sci. Cybern. 365–372 (2018)
Google Scholar
Luong, H.T., Vu. H.Q.: A non-expert Kaldi recipe for {V}ietnamese speech recognition system. In: Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies WLSI/OIAF4HLT@COLING, pp. 51–55 (2016)
Google Scholar
Su, B., Mao, S., Soong, F., Xia, Y., Tien, J., Wu, Z.: Improving pronunciation assessment via ordinal regression with anchored reference samples. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7748–7752 (2021)
Google Scholar
Shi, J., Huo, N., Jin, Q.: Context-aware goodness of pronunciation for computer-assisted pronunciation training. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2020-October (2020)
Google Scholar
Korzekwa, D., Trueba, J.L., Zaporowski, S., Calamaro, S., Drugman, T., Kostek, B.: Mispronunciation detection in non-native (L2) English with uncertainty modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021)
Google Scholar
Wang, Z., Zhang, J., Xie, Y.: L2 mispronunciation verification based on acoustic phone embedding and Siamese networks. In: Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 444–448 (2018)
Google Scholar
Snyder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus. ArXiv (2015)
Google Scholar
Zhang, J., et al.: Speechocean762: an open-source non-native English speech corpus for pronunciation assessment. ArXiv (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

FPT University, Hanoi, Vietnam
Nguyen Quang Minh & Phan Duy Hung

Authors

Nguyen Quang Minh
View author publications
You can also search for this author in PubMed Google Scholar
Phan Duy Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

HCMC University of Technology (HCMUT), Ho Chi Minh City, Vietnam
Tran Khanh Dang
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Sungkyunkwan University, Suwon, Korea (Republic of)
Tai M. Chung
Hosei University, Tokyo, Japan
Makoto Takizawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Minh, N.Q., Hung, P.D. (2021). The System for Detecting Vietnamese Mispronunciation. In: Dang, T.K., Küng, J., Chung, T.M., Takizawa, M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2021. Communications in Computer and Information Science, vol 1500. Springer, Singapore. https://doi.org/10.1007/978-981-16-8062-5_32

Download citation

DOI: https://doi.org/10.1007/978-981-16-8062-5_32
Published: 14 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8061-8
Online ISBN: 978-981-16-8062-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics