Skip to main content

A review of tools and techniques for computer aided pronunciation training (CAPT) in English

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Widespread use of English in the academia and in business is leading an increasing number of people to learn it as a second or a foreign language. Computer aided pronunciation training (CAPT) systems are used by non-native English speakers for improving their English pronunciation. A typical CAPT tool records the speech of a learner, detects and diagnoses mispronunciations in it, and suggests a way for correcting them. We classified the CAPT systems for English into four categories on the basis of the technology used in them and studied the salient features of each such category. We observed that visual simulation based systems are suitable for young and naive learners, game based systems are advantageous as they can be personalized as per the requirements of the learners, comparative phonetics based systems are suitable for adult learners fluent in another language, and artificial neural network based systems have the highest accuracy in mispronunciation diagnosis and are suitable for experienced and professional learners. We identified the state-of-the-art practices used in CAPT systems, and observed that CAPT systems can detect up to 86% mispronunciations in a speech and help learners to lessen mispronouncing by up to 23%. We recommend collaboration between language teachers and software developers to develop CAPT tools, their wide dissemination and integration with the curriculum at school and university levels, and further investigation on mobile and collaborative CAPT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abdou, S. M., Hamid, S. E., Rashwan, M., Samir, A., Abdel-Hamid, O., Shahin, M., & Nazih, W. (2006). Computer aided pronunciation learning system using speech recognition techniques. In: Proceedings of the ninth international conference on spoken language processing, pp. 849–852.

  • Abe, S., Nakata, S., Kigoshi, T., & Mochizuki, H. (2003). Designing and developing multilingual e-learning materials: TUFS language education pronunciation module - introduction of a system for learning Japanese language pronunciation. In: Proceedings of the Third IEEE International Conference on Advanced Learning Technologies, pp. 462–462.

  • Akima, Y., Watanabe, S., Tsubota, A., & Sone, M. (1992). Application of neural networks to the teaching of English pronunciation. In: Proceedings of the Singapore ICCS/ISITA Conference, vol. 2, pp. 553–557.

  • Athanasopoulos, G., Hagihara, K., Cierro, A., Guerit, R., Chatelain, J., Lucas, C., & Macq, B. (2017). 3D immersive karaoke for the learning of foreign language pronunciation. In: Proceedings of the international conference on 3D immersion, pp. 1–8.

  • Chen, L. -Y., & Jang, J. -S. R. (2015). Automatic pronunciation scoring with score combination by learning to rank and class-normalized DP-based quantization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1737–1749.

  • Chiu, C. -F., Lee, G. C., & Yang, J. -H. (2007). Design and implementation of video-enabled web-based pronunciation debugging system. In: Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies, pp. 374–378.

  • Giuliani, D., Mich, O., & Nardon, M. (2003). A study on the use of a voice interactive system for teaching English to Italian children. In: Proceedings of the Third IEEE International Conference on Advanced Learning Technologies, pp. 376–377.

  • Jain, D., Patil, A. P., Nawal, D. J., & Chakraborty, P. (2018). ARWAK: An augmented reality wordbook smartphone app for kindergarteners. Journal of Multi Disciplinary Engineering Technologies, 12(2), 59–66.

    Google Scholar 

  • Jing, X., & Yong, L. (2014). The speech evaluation method of English phoneme mobile learning system. In: Proceedings of the IEEE Workshop on Advanced Research and Technology in Industry Applications, pp. 546–550.

  • Juang, B. -H., & Furui, S. (2000). Automatic recognition and understanding of spoken language – A first step toward natural human-machine communication. Proceedings of the IEEE, 88(8), 1142–1165.

  • Kalikow, D. N., & Swets, J. A. (1972). Experiments with computer-controlled displays in second language learning. IEEE Transactions on Audio and Electro Acoustics, 20(1), 23–28.

    Google Scholar 

  • Lee, H. -Y., Tseng, B. -H., Wen, T. -H., & Tsao, Y. (2017). Personalizing recurrent-neural-network based language model by social network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 519–530.

  • Li, K., Qian, X., & Meng, H. (2017). Mispronunciation detection and diagnosis in L2 English speech using multi-distribution deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 193–207.

    Article  Google Scholar 

  • Liaw, M. -L. (2014). The affordance of speech recognition technology for EFL learning in an elementary school setting. Innovation in Language Learning and Teaching, 8(1), 79–93.

    Article  MathSciNet  Google Scholar 

  • Nakai, S., Beavan, D., Lawson, E., Leplâtre, G., Scobbie, J. M., & Smith, J. S. (2018). Viewing speech in action: Speech articulation videos in the public domain that demonstrate the sounds of the international phonetic alphabet (IPA). Innovation in Language Learning and Teaching, 12(3), 212–220.

    Article  Google Scholar 

  • Nyugen, V. A., Pham, V. C., & Ho, S. D. (2010). A context aware mobile learning adaptive system for supporting foreigner learning English. In: Proceedings of the IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future, pp. 1–6.

  • Qian, X., Soong, F., & Meng, H. (2010). Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). In: Proceedings of the eleventh annual conference of the international speech communication association, 757–760.

  • Qian, X., Meng, H., & Soong, F. (2012). The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. In: Proceedings of the thirteenth annual conference of the international speech communication association, pp. 775–778.

  • Qian, X., Meng, H., & Soong, F. (2016). A two-pass framework of mispronunciation detection and diagnosis for computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6), 1020–1028.

    Article  Google Scholar 

  • Samsudin, N. S. B., & Mano, K. (2017). Animated texts application in visualizing speech features for foreign language learning. In: Proceedings of the IEEE region 10 conference, pp. 1778–1783.

  • Satria, F., Aditra, H., Wibowo, M. D. A., Luthfiansyah, H., Suryani, M., Paulus, E., & Suryana, I. (2017). EFL learning media for early childhood through speech recognition application. In: Proceedings of Third International Conference on Science in Information Technology, pp. 568–572.

  • Shum, S. H., Harwath, D. F., Dehak, N., & Glass, J. R. (2016). On the use of acoustic unit discovery for language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9), 1665–1676.

    Article  Google Scholar 

  • Su, P. -H., Wu, C. -H., & Lee, L. -S. (2015). A recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 127–141.

  • Tianli, Z., Jia, L., Yanfeng, L., Shunping, H., & Chaolei, L. (2003). An automatic pronunciation teaching system for Chinese to learn English. In: Proceedings of the IEEE international conference on robotics intelligent systems and signal processing, vol. 2, pp. 1157–1161.

  • Wang, Y. B., & Lee, L. S. (2015). Supervised detection and unsupervised discovery of pronunciation error patterns for computer-assisted language learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 564–579.

    Article  Google Scholar 

  • Wang, L., Feng, X., & Helen, M. (2008). Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training. In: Proceedings of the ninth annual conference of the international speech communication association, pp. 1729–1732.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pinaki Chakraborty.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agarwal, C., Chakraborty, P. A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Educ Inf Technol 24, 3731–3743 (2019). https://doi.org/10.1007/s10639-019-09955-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-019-09955-7

Keywords