Skip to main content

A Variational Autoencoder Approach for Speech Signal Separation

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Included in the following conference series:

Abstract

Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then post-processing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)

    Google Scholar 

  2. Diederik, P.: Kingma and Max Welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)

    Article  Google Scholar 

  3. Shenoi, B.A.: Introduction to Digital Signal Processing and Filter Design. Wiley, Hoboken (2006)

    Google Scholar 

  4. Wolf, G., Mallat, S., Shamma, S.: Rigid motion model for audio source separation. IEEE Trans. Signal Process. 64(7), 1822–1831 (2016)

    Article  MathSciNet  Google Scholar 

  5. Yang, N., Usman, M., He, X., Jan, M.A., Zhang, L.: Time-frequency filter bank: a simple approach for audio and music separation. IEEE Access 5, 27114–27125 (2017)

    Article  Google Scholar 

  6. Serviere, C., Fabry, P.: Principal component analysis and blind source separation of modulated sources for electromechanical systems diagnostic. Mech. Syst. Signal Process. 19, 1293–1311 (2005)

    Article  Google Scholar 

  7. Lee, S., Pang, H.-S.: Multichannel non-negative matrix factorisation based on alternating least squares for audio source separation system. Electron. Lett. 51(3), 197–198 (2015)

    Article  Google Scholar 

  8. Chien, J., Hsieh, H.: Convex divergence ICA for blind source separation. IEEE Trans. Audio Speech Lang. Process. 20(1), 302–313 (2012)

    Article  Google Scholar 

  9. Fu, G.-S., Phlypo, R., Anderson, M., Li, X.-L., Adal, T.: Blind source separation by entropy rate minimization. IEEE Trans. Signal Process. 62(16), 4245–4255 (2014)

    Article  MathSciNet  Google Scholar 

  10. Liu, B., Reju, V.G., Khong, A.W.H., Reddy, V.V.: A GMM post-filter for residual crosstalk suppression in blind source separation. IEEE Signal Process. Lett. 21(8), 942–946 (2014)

    Article  Google Scholar 

  11. Hosseini, S., Deville, Y.: Blind separation of parametric nonlinear mixtures of possibly auto correlated and non-stationary sources. IEEE Trans. Signal Process. 62(24), 6521–6533 (2014)

    Article  MathSciNet  Google Scholar 

  12. Allen, J.B.: Short time spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 25(3), 235–238 (1977)

    Article  Google Scholar 

  13. Okopal, G., Wisdom, S., Atlas, L.: Speech analysis with the strong uncorrelating transform. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1858–1868 (2015)

    Article  Google Scholar 

  14. Kabal, P.: Time Windows for Linear Prediction of Speech. McGill University (2009)

    Google Scholar 

  15. Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)

    Article  Google Scholar 

  16. Mai, V.-K., Pastor, D., Aïssa-El-Bey, A., Le-Bidan, R.: Robust estimation of non-stationary noise power spectrum for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 670–682 (2015)

    Article  Google Scholar 

  17. Parande, P.G., Thomas, T.G.: A study of the cocktail party problem. In: International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017)

    Google Scholar 

  18. Oppenheim, A.V., Schafer, R.W., Buck, J.A.: Discrete-Time Signal Processing. Prentice Hall, Upper Saddle River (1999)

    Google Scholar 

  19. Blackman, R.B., Tukey, J.W.: The Measurement of Power Spectra from the Point of View of Communications Engineering. Dover Publications Publishing House, New York (1959)

    MATH  Google Scholar 

  20. Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall Publishing House, Upper Saddle River (2001)

    Google Scholar 

  21. Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)

    Article  Google Scholar 

  22. Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems 6, pp. 3–10 (1994)

    Google Scholar 

  23. Doersch, C.: Tutorial on variational autoencoders. arXiv:1606.05908 (2016)

  24. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  25. Kullback, S.: Information Theory and Statistics. Wiley, Hoboken (1959)

    MATH  Google Scholar 

  26. Rumelhart David, E., Hinton Geoffrey, E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  27. Do, H.D., Tran, S.T., Chau, D.T.: Speech source separation using variational autoencoder and bandpass filter. IEEE Access 8, 156219–156231 (2020)

    Article  Google Scholar 

  28. Fisher William, M., Doddington George, R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: specifications and status (1986)

    Google Scholar 

  29. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao D. Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Do, H.D., Tran, S.T., Chau, D.T. (2020). A Variational Autoencoder Approach for Speech Signal Separation. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63007-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63006-5

  • Online ISBN: 978-3-030-63007-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics