Skip to main content

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1500))

Included in the following conference series:

Abstract

The deepening international integration of Vietnam is open the occupation opportunity for more and more foreigners. The demand for learning Vietnamese, therefore, is rising. However, Vietnamese can be difficult to pronounce for some who just getting started due to its complex vowels and tone marks. A Computer-Assisted Pronunciation Training system can significantly improve the capability of pronouncing for learners. In particular, a mispronunciation model plays an important role in this type of system. While there is still a limited number of researches for Vietnamese, this research introduces a process to build the new mispronunciation detection for the language. Besides, comparing some of the current advanced techniques in this task such as Goodness of Pronunciation for scoring and deep neural network for classifying, a process to build the Vietnamese vowels - tones test set is also introduced to evaluate the performance of the system. The best performance of the proposed model achieved a 0.54 F1 score and 0.46 PCC score on the built test set. This is a reasonable result, compare with a 0.45 PCC score on another English test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/kaldi-asr/kaldi.

  2. 2.

    https://vlsp.org.vn/vlsp2020.

References

  1. Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Commun. 95–108 (2000)

    Google Scholar 

  2. Hu, W., Qian, Y., Soong, F., Wang, Y.: Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Commun. 154–166 (2015)

    Google Scholar 

  3. Pham, N.M., Vu, H.Q.: Acceleration in state of the art ASR applied to a Vietnamese transcription system. J. Comput. Sci. Cybern. 365–372 (2018)

    Google Scholar 

  4. Luong, H.T., Vu. H.Q.: A non-expert Kaldi recipe for {V}ietnamese speech recognition system. In: Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies WLSI/OIAF4HLT@COLING, pp. 51–55 (2016)

    Google Scholar 

  5. Su, B., Mao, S., Soong, F., Xia, Y., Tien, J., Wu, Z.: Improving pronunciation assessment via ordinal regression with anchored reference samples. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7748–7752 (2021)

    Google Scholar 

  6. Shi, J., Huo, N., Jin, Q.: Context-aware goodness of pronunciation for computer-assisted pronunciation training. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2020-October (2020)

    Google Scholar 

  7. Korzekwa, D., Trueba, J.L., Zaporowski, S., Calamaro, S., Drugman, T., Kostek, B.: Mispronunciation detection in non-native (L2) English with uncertainty modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021)

    Google Scholar 

  8. Wang, Z., Zhang, J., Xie, Y.: L2 mispronunciation verification based on acoustic phone embedding and Siamese networks. In: Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 444–448 (2018)

    Google Scholar 

  9. Snyder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus. ArXiv (2015)

    Google Scholar 

  10. Zhang, J., et al.: Speechocean762: an open-source non-native English speech corpus for pronunciation assessment. ArXiv (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Minh, N.Q., Hung, P.D. (2021). The System for Detecting Vietnamese Mispronunciation. In: Dang, T.K., Küng, J., Chung, T.M., Takizawa, M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2021. Communications in Computer and Information Science, vol 1500. Springer, Singapore. https://doi.org/10.1007/978-981-16-8062-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8062-5_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8061-8

  • Online ISBN: 978-981-16-8062-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics