Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning

Savchenko, Andrey V.; Savchenko, Vladimir V.; Savchenko, Lyudmila V.

doi:10.1007/978-3-030-49988-4_30

Andrey V. Savchenko¹²,
Vladimir V. Savchenko¹³ &
Lyudmila V. Savchenko¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12095))

Included in the following conference series:

International Conference on Mathematical Optimization Theory and Operations Research

648 Accesses
4 Citations

Abstract

This paper considers an assessment and evaluation of the pronunciation quality in computer-aided language learning systems. We propose the novel distortion measure for speech processing by using the gain optimization of the symmetrized Itakura-Saito divergence. This dissimilarity is implemented in a complete algorithm for pronunciation learning and improvement. At its first stage, a user has to achieve a stable pronunciation of all sounds by matching them with sounds of an ideal speaker. At the second stage, the recognition of sounds and their short sequences is carried out to guarantee the distinguishability of learned sounds. The training set may contain not only ideal sounds but the best utterances of a user obtained at the previous step. Finally, the word recognition accuracy is estimated by using deep neural networks fine-tuned on the best words from a user. Experimental study shows that the proposed procedure makes it possible to achieve high efficiency for learning of sounds and their sequences even in the presence of noise in an observed utterance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Golonka, E.M., Bowles, A.R., Frank, V.M., Richardson, D.L., Freynik, S.: Technologies for foreign language learning: a review of technology types and their effectiveness. Comput. Assist. Lang. Learn. 27(1), 70–105 (2014)
Article Google Scholar
Sztahó, D., Kiss, G., Vicsi, K.: Computer based speech prosody teaching system. Comput. Speech Lang. 50, 126–140 (2018)
Article Google Scholar
Han, K.I., Park, H.J., Lee, K.M.: Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique. In: Proceedings of the International Conference on Big Data and Smart Computing (BigComp), pp. 293–296. IEEE (2016)
Google Scholar
Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of Interspeech, pp. 1886–1890 (2013)
Google Scholar
Kneller, E., Karaulnyh, D.: System and method of converting voice signal into transcript presentation with metadata. RU Patent 2589851 C2, 10 July 2016
Google Scholar
Agarwal, C., Chakraborty, P.: A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Educ. Inf. Technol. 24(6), 3731–3743 (2019). https://doi.org/10.1007/s10639-019-09955-7
Article Google Scholar
Haikun, T., Shiying, W., Xinsheng, L., Yue, X.G.: Speech recognition model based on deep learning and application in pronunciation quality evaluation system. In: Proceedings of the International Conference on Data Mining and Machine Learning, pp. 1–5 (2019)
Google Scholar
Savchenko, V.V.: Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition. Radioelectron. Commun. Syst. 63(1), 42–54 (2020). https://doi.org/10.3103/S0735272720010045
Article Google Scholar
Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.: Eduspeak®: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Lang. Test. 27(3), 401–418 (2010)
Article Google Scholar
Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering hmm transition probabilities. In: Proceedings of Interspeech, pp. 954–958 (2019)
Google Scholar
Arias, J.P., Yoma, N.B., Vivanco, H.: Automatic intonation assessment for computer aided language learning. Speech Commun. 52(3), 254–267 (2010)
Article Google Scholar
Elaraby, M.S., Abdallah, M., Abdou, S., Rashwan, M.: A deep neural networks (DNN) based models for a computer aided pronunciation learning system. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 51–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_5
Chapter Google Scholar
Huang, G., Ye, J., Shen, Y., Zhou, Y.: A evaluating model of English pronunciation for Chinese students. In: Proceedings of the 9th International Conference on Communication Software and Networks (ICCSN), pp. 1062–1065. IEEE (2017)
Google Scholar
Xiao, Y., Soong, F., Hu, W.: Paired phone-posteriors approach to ESL pronunciation quality assessment. In: Proceedings of Interspeech, pp. 1631–1635 (2018)
Google Scholar
Srinivasan, A., Yarra, C., Ghosh, P.K.: Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM. In: Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE), pp. 30–34 (2019)
Google Scholar
Gu, L., Harris, J.G.: SLAP: a system for the detection and correction of pronunciation for second language acquisition. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), vol. 2, p. II. IEEE (2003)
Google Scholar
Gray, R., Buzo, A., Gray, A., Matsuyama, Y.: Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)
Article Google Scholar
Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9
Book Google Scholar
Mošner, L., et al.: Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6475–6479. IEEE (2019)
Google Scholar
Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Lett. 65, 145–151 (2015)
Article Google Scholar
Savchenko, L.V., Savchenko, A.V.: Fuzzy phonetic decoding method in a phoneme recognition problem. In: Drugman, T., Dutoit, T. (eds.) NOLISP 2013. LNCS (LNAI), vol. 7911, pp. 176–183. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38847-7_23
Chapter Google Scholar
Su, H.Y., Gao, Y.: Adaptive gain reduction for encoding a speech signal. US Patent 9,269,365, 23 February 2016
Google Scholar
Dionelis, N., Brookes, M.: Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors. In: Proceedings of the 25th European Signal Processing Conference (EUSIPCO), pp. 2309–2313. IEEE (2017)
Google Scholar
Erkelens, J., Jensen, J., Heusdens, R.: A data-driven approach to optimizing spectral speech enhancement methods for various error criteria. Speech Commun. 49(7–8), 530–541 (2007)
Article Google Scholar
Bastos, I., Oliveira, L.B., Goes, J., Silva, M.: MOSFET-only wideband LNA with noise cancelling and gain optimization. In: Proceedings of the 17th International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), pp. 306–311. IEEE (2010)
Google Scholar
Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. 17–20 (1968)
Google Scholar
Marple Jr., S.L.: Digital Spectral Analysis with Applications, 2nd edn. Dover Publications, Mineola, New York (2019). 432 p.
Google Scholar
Savchenko, V.V.: Itakura–Saito divergence as an element of the information theory of speech perception. J. Commun. Technol. Electron. 64(6), 590–596 (2019). https://doi.org/10.1134/S1064226919060093
Article Google Scholar
Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1997)
MATH Google Scholar
Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)
Article MathSciNet Google Scholar
Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)
Article Google Scholar
Savchenko, V.V., Savchenko, L.V.: Method for measuring the intelligibility of speech signals in the Kullback–Leibler information metric. Meas. Tech. 62(9), 832–839 (2019). https://doi.org/10.1007/s11018-019-01702-1
Article Google Scholar
Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1478–1482 (2015)
Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
Nakkiran, P., Alvarez, R., Prabhavalkar, R., Parada, C.: Compressing deep neural networks using a rank-constrained topology. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1473–1477 (2015)
Google Scholar
Kuchaiev, O., et al.: Nemo: a toolkit for building AI applications using neural modules. arXiv preprint arXiv:1909.09577 (2019)

Download references

Acknowledgements

The work was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE).

Author information

Authors and Affiliations

Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Nizhny Novgorod State Linguistic University, Nizhny Novgorod, Russia
Vladimir V. Savchenko
Department of Information Systems and Technologies, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Lyudmila V. Savchenko

Authors

Andrey V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
Lyudmila V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey V. Savchenko .

Editor information

Editors and Affiliations

Sobolev Institute of Mathematics SB RAS, Novosibirsk, Russia
Alexander Kononov
Krasovsky Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Valery A Kalyagin
University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Savchenko, A.V., Savchenko, V.V., Savchenko, L.V. (2020). Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning. In: Kononov, A., Khachay, M., Kalyagin, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2020. Lecture Notes in Computer Science(), vol 12095. Springer, Cham. https://doi.org/10.1007/978-3-030-49988-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-49988-4_30
Published: 29 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49987-7
Online ISBN: 978-3-030-49988-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics