Abstract
Second language learners’ correct and exact pronunciation is one of the important factors that help improve their own communication skills. Therefore, a system for predicting mispronunciation or assessing pronunciation accuracy for second language learners has been proposed and studied for decades. However, the results obtained are still very limited. In this paper, we present two popular deep learning models including Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) to solve the problem of predicting incorrect pronunciation for Vietnamese learners of English. This has great significance in building systems to help Vietnamese people during their English acquisition, specifically to improve their correct pronunciation of English. The experiment results on the L2-ARCTIC dataset have shown that both models achieve state-of-the-art performance. In addition, we also found that the LSTM model outperforms the CNN model by 6.3% in terms of accuracy due to the memory mechanism at each unit. The source code of our approach can be found at https://github.com/vdquang1991/Mispronounce_Prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: a foundation for dense object segmentation. In: ICCV, pp. 2061–2069 (2019)
Cheng, S., Liu, Z., Li, L., Tang, Z., Wang, D., Zheng, T.F.: Asr-free pronunciation assessment. arXiv preprint arXiv:2005.11902 (2020)
Dalby, J., Kewley-Port, D.: Explicit pronunciation training using automatic speech recognition technology. CALICO J. 16, 425–445 (1999)
Eskenazi, M.: An overview of spoken language technology for education. Speech Commun. 51(10), 832–844 (2009)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Graham, C., Nolan, F.: Articulation rate as a metric in spoken language assessment. In: INTERSPEECH (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP, pp. 1055–1059. IEEE (2020)
Knill, K., Gales, M., et al.: Automatically grading learners’ English using a gaussian process. In: ISCA (2015)
LaRocca, C.S.A., et al.: On the path to 2x learning: exploring the possibilities of advanced speech recognition. CALICO J. 16, 295–310 (1999)
Mostow, J., Aist, G.: Giving help and praise in a reading tutor with imperfect listening-because automated speech recognition means never being able to say you’re certain. CALICO J. 16, 407–424 (1999)
Neri, A., Mich, O., Gerosa, M., Giuliani, D.: The effectiveness of computer assisted pronunciation training for foreign language learning by children. Comput. Assist. Lang. Learn. 21(5), 393–408 (2008)
Neumeyer, L., et al.: Automatic text-independent pronunciation scoring of foreign language student speech. In: ICSLP 1996, vol. 3, pp. 1457–1460. IEEE (1996)
Phung, T., Nguyen, V.T., Ma, T.H.T., Duc, Q.V.: A (2+1)D attention convolutional neural network for video prediction. In: Dang, N.H.T., Zhang, Y.D., Tavares, J.M.R.S., Chen, B.H. (eds.) Artificial Intelligence in Data and Big Data Processing. ICABDE 2021. Lecture Notes on Data Engineering and Communications Technologies, vol. 124, pp. 395–406. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97610-1_31
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Rosen, K., Yampolsky, S.: Automatic speech recognition and a review of its functioning with dysarthric speech. Augment. Altern. Commun. 16(1), 48–60 (2000)
Strik, H., et al.: Comparing different approaches for automatic pronunciation error detection. Speech Commun. 51(10), 845–852 (2009)
Sudhakara, S., et al.: An improved goodness of pronunciation (gop) measure for pronunciation evaluation with DNN-hmm system considering hmm transition probabilities. In: INTERSPEECH, pp. 954–958 (2019)
Tan, H.M., et al.: Selective mutual learning: an efficient approach for single channel speech separation. In: ICASSP, pp. 3678–3682. IEEE (2022)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114. PMLR (2019)
Vieira, J.P.A., Moura, R.S.: An analysis of convolutional neural networks for sentence classification. In: CLEI, pp. 1–5. IEEE (2017)
Vu, D.Q., Le, N., Wang, J.C.: Teaching yourself: a self-knowledge distillation approach to action recognition. IEEE Access 9, 105711–105723 (2021)
Vu, D.Q., Le, N.T., Wang, J.C.: Self-supervised learning via multi-transformation classification for action recognition. arXiv preprint arXiv:2102.10378 (2021)
Vu, D.Q., Le, N.T., Wang, J.C.: (2+1)d distilled shufflenet: a lightweight unsupervised distillation network for human action recognition. In: ICPR. IEEE (2022)
Vu, D.Q., et al.: A novel self-knowledge distillation approach with SIAMESE representation learning for action recognition. In: VCIP, pp. 1–5. IEEE (2021)
Witt, S.M.: Automatic error detection in pronunciation training: where we are and where we need to go. In: International Symposium on Automatic Detection on Errors in Pronunciation Training, pp. 1–8 (2012)
Young, V., Mihailidis, A.: Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: a literature review. Assist. Technol. 22(2), 99–112 (2010)
Zhao, G., et al.: L2-arctic: a non-native English speech corpus. In: INTERSPEECH, pp. 2783–2787 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Phung, T., Vu, DQ., Mai-Tan, H., Nhung, L.T. (2022). Deep Models for Mispronounce Prediction for Vietnamese Learners of English. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_48
Download citation
DOI: https://doi.org/10.1007/978-981-19-8069-5_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8068-8
Online ISBN: 978-981-19-8069-5
eBook Packages: Computer ScienceComputer Science (R0)