Skip to main content

Far Field Speech Enhancement at Low SNR in Presence of Nonstationary Noise Based on Spectral Masking and MVDR Beamforming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Abstract

Low Signal to Noise Ratio (SNR) conditions are highly likely during remote speech acquisition. This paper handles a method of remote speech multi-channel signal processing for speech enhancement in presence of strong nonstationary noise. The presented approach builds upon the Minimum Variance Distortionless response (MVDR) method, additionally filtering the multi-channel signal prior to MVDR beamforming coefficient estimation with a spectral mask. This mask is obtained by applying mixture observation vector clustering based on a spatial correlation model, which is estimated by a Complex Gaussian Mixture Model (CGMM). The posterior probabilities obtained during the CGMM Expectation-Maximization (EM) algorithm are used to estimate the cumulative noise mask, which is applied to the mixture. The masked mixture is then used to calculate the MVDR covariance matrix and beamforming coefficients. The method is tested on four mixtures acquired using a 66 microphone array at various low SNR. The results are compared to conventional MVDR and several other methods and validated using the Signal to Distortion Ratio (SDR) improvement metric. The results show that the presented method gives SDR improvement no less than 1–1.5 dB in the majority of cases, compared to MVDR, and performs best specifically at low SNR of \(-15\)\(-20\) dB.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Araki, S., Okada, M., Higuchi, T., Ogawa, A., Nakatani, T.: Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 385–389, March 2016

    Google Scholar 

  2. Benesty, J., Chen, J., Huang, Y., Cohen, I.: Noise Reduction in Speech Processing. Springer Topics in Signal Processing, vol. 2. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-00296-0

    Book  Google Scholar 

  3. Brandstein, M., Ward, D.: Microphone Arrays: Signal Processing Techniques and Applications. Digital Signal Processing, Heidelberg (2010). https://doi.org/10.1007/978-3-662-04619-7

    Book  Google Scholar 

  4. Cauchi, B., et al.: Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech. EURASIP J. Adv. Signal Process. 61 (2015)

    Google Scholar 

  5. Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Proceedings of Interspeech Conference, INTERSPEECH, pp. 1981–1985 (2016)

    Google Scholar 

  6. Habets, E.A.P., Benesty, J.: A two-stage beamforming approach for noise reduction and dereverberation. IEEE Trans. Audio Speech Lang. Process. 21(5), 945–958 (2013)

    Article  Google Scholar 

  7. Higuchi, T., Ito, N., Araki, S., Yoshioka, T., Delcroix, M., Nakatani, T.: Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 780–793 (2017)

    Article  Google Scholar 

  8. Hong, L., Rosca, J., Balan, R.: Independent component analysis based single channel speech enhancement. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, pp. 522–525, December 2003

    Google Scholar 

  9. Jaureguiberry, X., Vincent, E., Richard, G.: Fusion methods for speech enhancement and audio source separation. IEEE Trans. Audio Speech Lang. Process. 24(7), 1266–1279 (2016)

    Article  Google Scholar 

  10. Korenevsky, M.L., Matveev, Y.N., Yakovlev, A.V.: Investigation and development of methods for improving robustness of automatic speech recognition algorithms in complex acoustic environments. In: Anisimov, K.V., et al. (eds.) Proceedings of the Scientific-Practical Conference “Research and Development - 2016”, pp. 11–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-62870-7_2

    Chapter  Google Scholar 

  11. Oleinik, A.: A lightweight face tracking system for video surveillance. In: Campilho, A., Karray, F. (eds.) ICIAR 2016. LNCS, vol. 9730. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41501-7_46

    Chapter  Google Scholar 

  12. Prudnikov, A., Korenevsky, M., Aleinik, S.: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, pp. 401–408, December 2015

    Google Scholar 

  13. Stolbov, M., Aleinik, S.: Speech enhancement with microphone array using frequency-domain alignment technique. In: Proceedings of the Audio Engineering Society 54th International Conference, Audio Forensics, London, pp. 1–6, June 2014

    Google Scholar 

  14. Upadhyay, N., Karmakar, A.: Speech enhancement using spectral subtraction-type algorithms: a comparison and simulation study. Procedia Comput. Sci. 54, 574–584 (2015)

    Article  Google Scholar 

  15. Zhao, Y., Jensen, J.R., Christensen, M.G., Doclo, S., Chen, J.: Experimental study of robust beamforming techniques for acoustic applications. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 86–90, October 2017

    Google Scholar 

Download references

Acknowledgments

This research was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0132 (IDRFMEFI57517X0132).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergei Astapov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Astapov, S., Lavrentyev, A., Shuranov, E. (2018). Far Field Speech Enhancement at Low SNR in Presence of Nonstationary Noise Based on Spectral Masking and MVDR Beamforming. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics