Abstract
Regarding the assistance of computer-assisted language learning (CALL) systems to make foreign language learning easier, it is necessary to recognize the utterances of the learner with high accuracy. The quality of CALL systems mainly depends on the accuracy of automatic speech recognition (ASR). However, since the pronunciation of non-native speakers is greatly different from that of native speakers, existing ASR system cannot well recognize speech accurately. To solve this problem, this research projects an acoustic model based on deep neural networks (DNN), which is trained by using ERJ (English Read by Japanese) database collected from 202 Japanese learners. Compared with traditional ASR systems, this new system significantly promotes the speech recognition accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lee, S., Noh, H., Lee, J., Lee, K., Lee, G.G.: POSTECH approaches for dialog-based English conversation tutoring. In: Proceedings APSIPA ASC, pp. 794–803 (2010)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Raux, A., Eskenazi, M.: Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges. In: Proceedings InSTIL/ICALL Symposium, pp. 147–150 (2004)
Witt, S., Young, S.J.: Language learning based on non-native speech recognition. In: Proceedings EUROSPEECH, pp. 633–636 (1997)
Minematsu, N., Kurata, G., Hirose, K.: Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition. In: Proceedings ICSLP, pp. 529–531 (2002)
Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: Proceedings ICASSP, pp. 540–543 (2003)
Oh, Y.R., Yoon, J.S., Kim, H.K.: Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Commun. 49(1), 59–70 (2007)
Tan, T.P., Besacier, L.: Acoustic model interpolation for non-native speech recognition. In: Proceedings ICASSP, pp. 1009–1012 (2007)
Van Doremalen, J., Cucchiarini, C., Strik, H.: Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP EURASIP J. Audio, Speech, Music. Process. 2010(1), 973–954 (2010)
Wang, X., Yamamoto, S.: Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets. In: Proceedings INTERSPEECH, pp. 1265–1269 (2015)
Chen, X., Cheng, J.: Deep neural network acoustic modeling for native and non-native Mandarin speech recognition. In: Proceedings ISCSLP, pp. 6–9 (2014)
Cheng, J., Chen, X., Metallinou, A.: Deep neural network acoustic models for spoken assessment applications. Speech Commun. 73, 14–27 (2015)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: Proceedings ICASSP, pp. 215–219 (2014)
Makino, T., Aoki, R.: English read by Japanese phonetic corpus: an interim report. Res. Lang. 10(1), 79–95 (2012)
Minematsu, N., Okabe, K., Ogaki, K., Hirose, K.: Measurement of objective intelligibility of Japanese accented English using ERJ (English Read by Japanese) database. In: Proceedings INTERSPEECH, pp. 1481–1484 (2011)
Luo, D., Qiao, Y., Minematsu, N., Yamauchi, Y., Hirose, K.: Regularized-MLLR speaker adaptation for computer-assisted language learning system. In: Proceedings INTERSPEECH, pp. 594–597 (2010)
Ito, A., Tsutsui, R., Makino, S., Suzuki, M.: Recognition of english utterances with grammatical and lexical mistakes for dialogue-based CALL system. In: Proceedings INTERSPEECH, pp. 2819–2822 (2008)
Wang, X., Kato, T., Yamamoto, S.: Phoneme set design based on integrated acoustic and linguistic features for second language speech recognition. IEICE Trans. Inf. Syst. 100(4), 857–864 (2017)
Oshima, Y., Takamichi, S., Toda, T., Neubig, G., Sakti, S., Nakamura, S.: Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Trans. Inf. Syst. 99(12), 3132–3139 (2016)
The CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Yoshioka, T., Chen, X., Gales, M.J.F.: Impact of single-microphone dereverberation on DNN-based meeting transcription systems. In: Proceedings ICASSP, pp. 5527–5531 (2014)
Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: Proceedings ICASSP, pp. 4273–4276 (2012)
Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: Proceedings ISCSLP, pp. 301–305 (2012)
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number JP17H00823.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, J., Chiba, Y., Nose, T., Ito, A. (2019). Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-03748-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03747-5
Online ISBN: 978-3-030-03748-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)