Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models

Fu, Jiang; Chiba, Yuya; Nose, Takashi; Ito, Akinori

doi:10.1007/978-3-030-03748-2_11

Jiang Fu⁷,
Yuya Chiba⁷,
Takashi Nose⁷ &
…
Akinori Ito⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 110))

Included in the following conference series:

International Conference on Intelligent Information Hiding and Multimedia Signal Processing

540 Accesses
1 Citations

Abstract

Regarding the assistance of computer-assisted language learning (CALL) systems to make foreign language learning easier, it is necessary to recognize the utterances of the learner with high accuracy. The quality of CALL systems mainly depends on the accuracy of automatic speech recognition (ASR). However, since the pronunciation of non-native speakers is greatly different from that of native speakers, existing ASR system cannot well recognize speech accurately. To solve this problem, this research projects an acoustic model based on deep neural networks (DNN), which is trained by using ERJ (English Read by Japanese) database collected from 202 Japanese learners. Compared with traditional ASR systems, this new system significantly promotes the speech recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lee, S., Noh, H., Lee, J., Lee, K., Lee, G.G.: POSTECH approaches for dialog-based English conversation tutoring. In: Proceedings APSIPA ASC, pp. 794–803 (2010)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Raux, A., Eskenazi, M.: Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges. In: Proceedings InSTIL/ICALL Symposium, pp. 147–150 (2004)
Google Scholar
Witt, S., Young, S.J.: Language learning based on non-native speech recognition. In: Proceedings EUROSPEECH, pp. 633–636 (1997)
Google Scholar
Minematsu, N., Kurata, G., Hirose, K.: Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition. In: Proceedings ICSLP, pp. 529–531 (2002)
Google Scholar
Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: Proceedings ICASSP, pp. 540–543 (2003)
Google Scholar
Oh, Y.R., Yoon, J.S., Kim, H.K.: Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Commun. 49(1), 59–70 (2007)
Article Google Scholar
Tan, T.P., Besacier, L.: Acoustic model interpolation for non-native speech recognition. In: Proceedings ICASSP, pp. 1009–1012 (2007)
Google Scholar
Van Doremalen, J., Cucchiarini, C., Strik, H.: Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP EURASIP J. Audio, Speech, Music. Process. 2010(1), 973–954 (2010)
Google Scholar
Wang, X., Yamamoto, S.: Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets. In: Proceedings INTERSPEECH, pp. 1265–1269 (2015)
Google Scholar
Chen, X., Cheng, J.: Deep neural network acoustic modeling for native and non-native Mandarin speech recognition. In: Proceedings ISCSLP, pp. 6–9 (2014)
Google Scholar
Cheng, J., Chen, X., Metallinou, A.: Deep neural network acoustic models for spoken assessment applications. Speech Commun. 73, 14–27 (2015)
Article Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: Proceedings ICASSP, pp. 215–219 (2014)
Google Scholar
Makino, T., Aoki, R.: English read by Japanese phonetic corpus: an interim report. Res. Lang. 10(1), 79–95 (2012)
Article Google Scholar
Minematsu, N., Okabe, K., Ogaki, K., Hirose, K.: Measurement of objective intelligibility of Japanese accented English using ERJ (English Read by Japanese) database. In: Proceedings INTERSPEECH, pp. 1481–1484 (2011)
Google Scholar
Luo, D., Qiao, Y., Minematsu, N., Yamauchi, Y., Hirose, K.: Regularized-MLLR speaker adaptation for computer-assisted language learning system. In: Proceedings INTERSPEECH, pp. 594–597 (2010)
Google Scholar
Ito, A., Tsutsui, R., Makino, S., Suzuki, M.: Recognition of english utterances with grammatical and lexical mistakes for dialogue-based CALL system. In: Proceedings INTERSPEECH, pp. 2819–2822 (2008)
Google Scholar
Wang, X., Kato, T., Yamamoto, S.: Phoneme set design based on integrated acoustic and linguistic features for second language speech recognition. IEICE Trans. Inf. Syst. 100(4), 857–864 (2017)
Article Google Scholar
Oshima, Y., Takamichi, S., Toda, T., Neubig, G., Sakti, S., Nakamura, S.: Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Trans. Inf. Syst. 99(12), 3132–3139 (2016)
Article Google Scholar
The CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Yoshioka, T., Chen, X., Gales, M.J.F.: Impact of single-microphone dereverberation on DNN-based meeting transcription systems. In: Proceedings ICASSP, pp. 5527–5531 (2014)
Google Scholar
Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: Proceedings ICASSP, pp. 4273–4276 (2012)
Google Scholar
Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: Proceedings ISCSLP, pp. 301–305 (2012)
Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP17H00823.

Author information

Authors and Affiliations

Graduate School of Engineering, Tohoku University, Aramaki Aza Aoba 6-6-05, Aoba-Ku, Sendai, Miyagi, 980-8579, Japan
Jiang Fu, Yuya Chiba, Takashi Nose & Akinori Ito

Authors

Jiang Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akinori Ito .

Editor information

Editors and Affiliations

College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Akinori Ito
Swinburne University of Technology, Hawthorn, VIC, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, J., Chiba, Y., Nose, T., Ito, A. (2019). Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-03748-2_11
Published: 11 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03747-5
Online ISBN: 978-3-030-03748-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics