Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures

Kim, Minje; Smaragdis, Paris

doi:10.1007/978-3-319-22482-4_12

Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures

Minje Kim¹⁷ &
Paris Smaragdis¹⁸

Conference paper
First Online: 01 January 2015

2848 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9237))

Abstract

This work aims at a test-time fine-tune scheme to further improve the performance of an already-trained Denoising AutoEncoder (DAE) in the context of semi-supervised audio source separation. Although the state-of-the-art deep learning-based DAEs show sensible denoising performance when the nature of artifacts is known in advance, the scalability of an already-trained network to an unseen signal with an unknown characteristic of deformation is not well studied. To handle this problem, we propose an adaptive fine-tuning scheme where we define a test-time target variables so that a DAE can learn from the newly available sources and the mixing environments in the test mixtures. In the proposed network topology, we stack an AutoEncoder (AE) trained from clean source spectra of interest on top of a DAE trained from a variety of available mixture spectra. Hence, the bottom DAE outputs are used as the input to the top AE, which is to check the purity of the once denoised DAE output. Then, the top AE error is used to fine-tune the bottom DAE during the test phase. Experimental results on audio source separation tasks demonstrate that the proposed fine-tuning technique can further improve the sound quality of a DAE during the test procedure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this section we use terminologies from NMF-based models without the loss of generality in the other latent variable models.
2.
We drop the absolute function \(|\cdot |\) from now on for brevity, but the readers should be aware that mixing in the time domain does not hold in the magnitude Fourier domain.

References

Duan, Z., Mysore, G.J., Smaragdis, P.: Online PLCA for real-time semi-supervised source separation. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) LVA/ICA 2012. LNCS, vol. 7191, pp. 34–41. Springer, Heidelberg (2012)
Chapter Google Scholar
Hsu, C.L., Jang, J.S.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Trans. Audio, Speech, Lang. Process. 18(2), 310–319 (2010)
Article Google Scholar
Huang, P., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014
Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems (NIPS), vol. 13. MIT Press (2001)
Google Scholar
Liu, D., Smaragdis, P., Kim, M.: Experiments on deep learning for speech denoising. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Sep 2014
Google Scholar
Raj, B., Smaragdis, P.: Latent variable decomposition of spectrograms for single channel speaker separation. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 17–20 (2005)
Google Scholar
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)
Article Google Scholar
Sprechmann, P., Bronstein, A., Sapiro, G.: Real-time online singing voice separation from monaural recordings using robust low-rank modeling. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR) (2012)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MATH MathSciNet Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1096–1103 (2008)
Google Scholar
Wang, Y., Wang, D.L.: Towards scaling up classification-based speech separation. IEEE Trans. Audio, Speech Lang. Process. 21(7), 1381–1390 (2013)
Article Google Scholar
Williamson, D.S., Wang, Y., Wang, D.L.: Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am. 136, 892–902 (2014)
Article Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA
Minje Kim
University of Illinois at Urbana-Champaign, Adobe Research, Champaign, USA
Paris Smaragdis

Authors

Minje Kim
View author publications
You can also search for this author in PubMed Google Scholar
Paris Smaragdis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minje Kim .

Editor information

Editors and Affiliations

Inria, Villers-les-Nancy, France
Emmanuel Vincent
Tel Aviv University, Tel-Aviv, Israel
Arie Yeredor
Technical University of Libere, Liberec, Czech Republic
Zbyněk Koldovský
The Czech Academy of Sciences, Prague, Czech Republic
Petr Tichavský

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, M., Smaragdis, P. (2015). Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-22482-4_12
Published: 15 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22481-7
Online ISBN: 978-3-319-22482-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics