Deep Models for Mispronounce Prediction for Vietnamese Learners of English

Phung, Trang; Vu, Duc-Quang; Mai-Tan, Ha; Nhung, Le Thi

doi:10.1007/978-981-19-8069-5_48

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1688))

Included in the following conference series:

International Conference on Future Data and Security Engineering

2022 Accesses
3 Citations

Abstract

Second language learners’ correct and exact pronunciation is one of the important factors that help improve their own communication skills. Therefore, a system for predicting mispronunciation or assessing pronunciation accuracy for second language learners has been proposed and studied for decades. However, the results obtained are still very limited. In this paper, we present two popular deep learning models including Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) to solve the problem of predicting incorrect pronunciation for Vietnamese learners of English. This has great significance in building systems to help Vietnamese people during their English acquisition, specifically to improve their correct pronunciation of English. The experiment results on the L2-ARCTIC dataset have shown that both models achieve state-of-the-art performance. In addition, we also found that the LSTM model outperforms the CNN model by 6.3% in terms of accuracy due to the memory mechanism at each unit. The source code of our approach can be found at https://github.com/vdquang1991/Mispronounce_Prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning-Based Automatic Pronunciation Assessment for Second Language Learners

The System for Detecting Vietnamese Mispronunciation

L2 Mispronunciation Verification Based on Acoustic Phone Embedding and Siamese Networks

Article 24 September 2020

References

Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: a foundation for dense object segmentation. In: ICCV, pp. 2061–2069 (2019)
Google Scholar
Cheng, S., Liu, Z., Li, L., Tang, Z., Wang, D., Zheng, T.F.: Asr-free pronunciation assessment. arXiv preprint arXiv:2005.11902 (2020)
Dalby, J., Kewley-Port, D.: Explicit pronunciation training using automatic speech recognition technology. CALICO J. 16, 425–445 (1999)
Article Google Scholar
Eskenazi, M.: An overview of spoken language technology for education. Speech Commun. 51(10), 832–844 (2009)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Graham, C., Nolan, F.: Articulation rate as a metric in spoken language assessment. In: INTERSPEECH (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP, pp. 1055–1059. IEEE (2020)
Google Scholar
Knill, K., Gales, M., et al.: Automatically grading learners’ English using a gaussian process. In: ISCA (2015)
Google Scholar
LaRocca, C.S.A., et al.: On the path to 2x learning: exploring the possibilities of advanced speech recognition. CALICO J. 16, 295–310 (1999)
Article Google Scholar
Mostow, J., Aist, G.: Giving help and praise in a reading tutor with imperfect listening-because automated speech recognition means never being able to say you’re certain. CALICO J. 16, 407–424 (1999)
Article Google Scholar
Neri, A., Mich, O., Gerosa, M., Giuliani, D.: The effectiveness of computer assisted pronunciation training for foreign language learning by children. Comput. Assist. Lang. Learn. 21(5), 393–408 (2008)
Article Google Scholar
Neumeyer, L., et al.: Automatic text-independent pronunciation scoring of foreign language student speech. In: ICSLP 1996, vol. 3, pp. 1457–1460. IEEE (1996)
Google Scholar
Phung, T., Nguyen, V.T., Ma, T.H.T., Duc, Q.V.: A (2+1)D attention convolutional neural network for video prediction. In: Dang, N.H.T., Zhang, Y.D., Tavares, J.M.R.S., Chen, B.H. (eds.) Artificial Intelligence in Data and Big Data Processing. ICABDE 2021. Lecture Notes on Data Engineering and Communications Technologies, vol. 124, pp. 395–406. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97610-1_31
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Rosen, K., Yampolsky, S.: Automatic speech recognition and a review of its functioning with dysarthric speech. Augment. Altern. Commun. 16(1), 48–60 (2000)
Article Google Scholar
Strik, H., et al.: Comparing different approaches for automatic pronunciation error detection. Speech Commun. 51(10), 845–852 (2009)
Article Google Scholar
Sudhakara, S., et al.: An improved goodness of pronunciation (gop) measure for pronunciation evaluation with DNN-hmm system considering hmm transition probabilities. In: INTERSPEECH, pp. 954–958 (2019)
Google Scholar
Tan, H.M., et al.: Selective mutual learning: an efficient approach for single channel speech separation. In: ICASSP, pp. 3678–3682. IEEE (2022)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114. PMLR (2019)
Google Scholar
Vieira, J.P.A., Moura, R.S.: An analysis of convolutional neural networks for sentence classification. In: CLEI, pp. 1–5. IEEE (2017)
Google Scholar
Vu, D.Q., Le, N., Wang, J.C.: Teaching yourself: a self-knowledge distillation approach to action recognition. IEEE Access 9, 105711–105723 (2021)
Article Google Scholar
Vu, D.Q., Le, N.T., Wang, J.C.: Self-supervised learning via multi-transformation classification for action recognition. arXiv preprint arXiv:2102.10378 (2021)
Vu, D.Q., Le, N.T., Wang, J.C.: (2+1)d distilled shufflenet: a lightweight unsupervised distillation network for human action recognition. In: ICPR. IEEE (2022)
Google Scholar
Vu, D.Q., et al.: A novel self-knowledge distillation approach with SIAMESE representation learning for action recognition. In: VCIP, pp. 1–5. IEEE (2021)
Google Scholar
Witt, S.M.: Automatic error detection in pronunciation training: where we are and where we need to go. In: International Symposium on Automatic Detection on Errors in Pronunciation Training, pp. 1–8 (2012)
Google Scholar
Young, V., Mihailidis, A.: Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: a literature review. Assist. Technol. 22(2), 99–112 (2010)
Article Google Scholar
Zhao, G., et al.: L2-arctic: a non-native English speech corpus. In: INTERSPEECH, pp. 2783–2787 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Thai Nguyen University, Thai Nguyen, Vietnam
Trang Phung & Le Thi Nhung
Thai Nguyen University of Education, Thai Nguyen, Vietnam
Duc-Quang Vu
National Central University, Taoyuan, Taiwan
Duc-Quang Vu
National Taiwan University, New Taipei, Taiwan
Ha Mai-Tan

Authors

Trang Phung
View author publications
You can also search for this author in PubMed Google Scholar
Duc-Quang Vu
View author publications
You can also search for this author in PubMed Google Scholar
Ha Mai-Tan
View author publications
You can also search for this author in PubMed Google Scholar
Le Thi Nhung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc-Quang Vu .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Food Industry, Ho Chi Minh City, Vietnam
Tran Khanh Dang
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Sungkyunkwan University, Seoul, Korea (Republic of)
Tai M. Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phung, T., Vu, DQ., Mai-Tan, H., Nhung, L.T. (2022). Deep Models for Mispronounce Prediction for Vietnamese Learners of English. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_48

Download citation

DOI: https://doi.org/10.1007/978-981-19-8069-5_48
Published: 20 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8068-8
Online ISBN: 978-981-19-8069-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Models for Mispronounce Prediction for Vietnamese Learners of English

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning-Based Automatic Pronunciation Assessment for Second Language Learners

The System for Detecting Vietnamese Mispronunciation

L2 Mispronunciation Verification Based on Acoustic Phone Embedding and Siamese Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deep Models for Mispronounce Prediction for Vietnamese Learners of English

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning-Based Automatic Pronunciation Assessment for Second Language Learners

The System for Detecting Vietnamese Mispronunciation

L2 Mispronunciation Verification Based on Acoustic Phone Embedding and Siamese Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation