A Variational Autoencoder Approach for Speech Signal Separation

Do, Hao D.; Tran, Son T.; Chau, Duc T.

doi:10.1007/978-3-030-63007-2_43

Hao D. Do^14,15,16,
Son T. Tran^14,15 &
Duc T. Chau^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1302 Accesses
6 Citations

Abstract

Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then post-processing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)
Google Scholar
Diederik, P.: Kingma and Max Welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Article Google Scholar
Shenoi, B.A.: Introduction to Digital Signal Processing and Filter Design. Wiley, Hoboken (2006)
Google Scholar
Wolf, G., Mallat, S., Shamma, S.: Rigid motion model for audio source separation. IEEE Trans. Signal Process. 64(7), 1822–1831 (2016)
Article MathSciNet Google Scholar
Yang, N., Usman, M., He, X., Jan, M.A., Zhang, L.: Time-frequency filter bank: a simple approach for audio and music separation. IEEE Access 5, 27114–27125 (2017)
Article Google Scholar
Serviere, C., Fabry, P.: Principal component analysis and blind source separation of modulated sources for electromechanical systems diagnostic. Mech. Syst. Signal Process. 19, 1293–1311 (2005)
Article Google Scholar
Lee, S., Pang, H.-S.: Multichannel non-negative matrix factorisation based on alternating least squares for audio source separation system. Electron. Lett. 51(3), 197–198 (2015)
Article Google Scholar
Chien, J., Hsieh, H.: Convex divergence ICA for blind source separation. IEEE Trans. Audio Speech Lang. Process. 20(1), 302–313 (2012)
Article Google Scholar
Fu, G.-S., Phlypo, R., Anderson, M., Li, X.-L., Adal, T.: Blind source separation by entropy rate minimization. IEEE Trans. Signal Process. 62(16), 4245–4255 (2014)
Article MathSciNet Google Scholar
Liu, B., Reju, V.G., Khong, A.W.H., Reddy, V.V.: A GMM post-filter for residual crosstalk suppression in blind source separation. IEEE Signal Process. Lett. 21(8), 942–946 (2014)
Article Google Scholar
Hosseini, S., Deville, Y.: Blind separation of parametric nonlinear mixtures of possibly auto correlated and non-stationary sources. IEEE Trans. Signal Process. 62(24), 6521–6533 (2014)
Article MathSciNet Google Scholar
Allen, J.B.: Short time spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 25(3), 235–238 (1977)
Article Google Scholar
Okopal, G., Wisdom, S., Atlas, L.: Speech analysis with the strong uncorrelating transform. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1858–1868 (2015)
Article Google Scholar
Kabal, P.: Time Windows for Linear Prediction of Speech. McGill University (2009)
Google Scholar
Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)
Article Google Scholar
Mai, V.-K., Pastor, D., Aïssa-El-Bey, A., Le-Bidan, R.: Robust estimation of non-stationary noise power spectrum for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 670–682 (2015)
Article Google Scholar
Parande, P.G., Thomas, T.G.: A study of the cocktail party problem. In: International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017)
Google Scholar
Oppenheim, A.V., Schafer, R.W., Buck, J.A.: Discrete-Time Signal Processing. Prentice Hall, Upper Saddle River (1999)
Google Scholar
Blackman, R.B., Tukey, J.W.: The Measurement of Power Spectra from the Point of View of Communications Engineering. Dover Publications Publishing House, New York (1959)
MATH Google Scholar
Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall Publishing House, Upper Saddle River (2001)
Google Scholar
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Article Google Scholar
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems 6, pp. 3–10 (1994)
Google Scholar
Doersch, C.: Tutorial on variational autoencoders. arXiv:1606.05908 (2016)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Kullback, S.: Information Theory and Statistics. Wiley, Hoboken (1959)
MATH Google Scholar
Rumelhart David, E., Hinton Geoffrey, E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Do, H.D., Tran, S.T., Chau, D.T.: Speech source separation using variational autoencoder and bandpass filter. IEEE Access 8, 156219–156231 (2020)
Article Google Scholar
Fisher William, M., Doddington George, R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: specifications and status (1986)
Google Scholar
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Science, Ho Chi Minh City, Vietnam
Hao D. Do, Son T. Tran & Duc T. Chau
Vietnam National University, Ho Chi Minh City, Vietnam
Hao D. Do, Son T. Tran & Duc T. Chau
OLLI Technology JSC, Ho Chi Minh City, Vietnam
Hao D. Do

Authors

Hao D. Do
View author publications
You can also search for this author in PubMed Google Scholar
Son T. Tran
View author publications
You can also search for this author in PubMed Google Scholar
Duc T. Chau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao D. Do .

Editor information

Editors and Affiliations

Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Thua Thien Hue Center of Information Technology, Hue, Vietnam
Bao Hung Hoang
Vietnam - Korea University of Information and Communication Technology, University of Da Nang, Da Nang, Vietnam
Cong Phap Huynh
Department of Computer Engineering, Yeungnam University, Gyeungsan, Korea (Republic of)
Dosam Hwang
Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Do, H.D., Tran, S.T., Chau, D.T. (2020). A Variational Autoencoder Approach for Speech Signal Separation. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-63007-2_43
Published: 23 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics