Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks

Tkachenko, Maxim; Yamshinin, Alexander; Lyubimov, Nikolay; Kotov, Mikhail; Nastasenko, Marina

doi:10.1007/978-3-319-66429-3_69

Maxim Tkachenko¹⁶,
Alexander Yamshinin¹⁶,
Nikolay Lyubimov¹⁷,
Mikhail Kotov¹⁶ &
…
Marina Nastasenko¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2308 Accesses
4 Citations

Abstract

This paper describes the speech denoising system based on long short-term memory (LSTM) neural networks. The architecture of the presented network is designed to make speech enhancement in spectrogram magnitude domain. The audio resynthesis is performed via the inverse short-time Fourier transform by maintaining the original phase. Objective quality is assessed by root mean square error between clean and denoised audio signals on CHiME corpus and speaker verification rate by using RSR2015 corpus. Proposed system demonstrates improved results on both metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lukin, A., Todd, J.: Suppression of musical noise artifacts in audio noise reduction by adaptive 2D filtering. In: Audio Engineering Society Convention 123 (2007)
Google Scholar
Valin, J.-M.: Speex: a free codec for free speech. In: linux.conf.au Conference (2006)
Google Scholar
Liu, D., Smaragdis, P., Kim, M.: Experiments on deep learning for speech denoising. In: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH), Singapore, pp. 2685–2689 (2014)
Google Scholar
Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE Press, Italy (2014)
Google Scholar
Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Liberec, Czech Republic (2015)
Google Scholar
Sun, L., Kang, S., Li, K., Meng, H.: Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Queensland, Australia (2015)
Google Scholar
Mimura, M., Sakai, S., Kawahara, T.: Speech dereverberation using long short-term memory. In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany, pp. 2435–2439 (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Xu, Y., Du, J., Da, L.-R.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015). IEEE Press
Article Google Scholar
Keras: Deep Learning library for Theano and TensorFlow. https://keras.io
Hinton, G., Deng, L., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29(6), 82–97 (2012). IEEE Press
Article Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning (ICML), Lille, France (2015)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Crochiere, R.: A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Sig. Process. ASSP–28, 99–102 (1980)
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference for Learning Representations, San Diego (2015)
Google Scholar
Christensen, H., Barker, J., Ma, N., Green, P.: The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), Chiba, Japan (2010)
Google Scholar
Larcher, A., Lee, K.A., Ma, B., Li, H.: RSR2015: database for textdependent speaker verification using multiple pass-phrases. In: 13th Annual Conference of the International Speech Communication Association (INTERSPEECH), Portland, Oregon, USA (2012)
Google Scholar
Ferras, M., Madikeri, S., Motlicek, P., Dey, S., Bourlard, H.: A large-scale open-source acoustic simulator for speaker recognition. IEEE Sig. Process. Lett. 23(4), 527–531 (2016)
Article Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19, 788–798 (2011). IEEE Press
Article Google Scholar
Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, pp. 1–8 (2007)
Google Scholar
Testarium. Research tool. http://testarium.makseq.com
Denoising examples. http://denoiser.makseq.com

Download references

Author information

Authors and Affiliations

ASM Solutions LLC, Moscow, Russia
Maxim Tkachenko, Alexander Yamshinin & Mikhail Kotov
Lomonosov Moscow State University, Moscow, Russia
Nikolay Lyubimov
Master Synthesis LLC, Moscow, Russia
Marina Nastasenko

Authors

Maxim Tkachenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Yamshinin
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Lyubimov
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Kotov
View author publications
You can also search for this author in PubMed Google Scholar
Marina Nastasenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxim Tkachenko .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tkachenko, M., Yamshinin, A., Lyubimov, N., Kotov, M., Nastasenko, M. (2017). Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_69
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics