Skip to main content

A Deep Hybrid Model for Stereophonic Acoustic Echo Control

  • Short Paper
  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper proposes a deep hybrid model for stereophonic acoustic echo cancellation (SAEC). A two-stage model is considered, i.e., a deep-learning-based Kalman filter (DeepKalman) and a gated convolutional recurrent network (GCRN)-based postfilter, which are jointly trained in an end-to-end manner. The difference between the proposed DeepKalman filter and the conventional one is twofold. First, the input signal of the DeepKalman filter is a combination of the original two far-end signals and the nonlinear reference signal estimated from the microphone signal directly. Second, a low-complexity recurrent neural network is utilized to estimate the covariance of the process noise, which can enhance the tracking capability of the DeepKalman filter. In the second stage, we adopt GCRN to suppress residual echo and noise by estimating complex masks applied to the output signal of the first stage. Computer simulations confirm the performance advantage of the proposed method over existing SAEC algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The datasets generated during the current study are available from the corresponding author by reasonable request.

References

  1. Y. Avargel, I. Cohen, System identification in the short-time Fourier transform domain with crossband filtering. IEEE Trans. Audio Speech Lang. Process. 15(4), 1305–1319 (2007). https://doi.org/10.1109/TASL.2006.889720

    Article  Google Scholar 

  2. J. Benesty, D.R. Morgan, M.M. Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Process. 6(2), 156–165 (1998). https://doi.org/10.1109/89.661474

    Article  Google Scholar 

  3. J. Benesty, T. Gänsler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, 1st edn. (Springer, Berlin, 2001)

    Book  Google Scholar 

  4. Z. Chen, X. Xia, S. Sun, Z. Wang, C. Chen, G. Xie, P. Zhang, Y. Xiao, A progressive neural network for acoustic echo cancellation, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Rhodes Island, Greece, 2023), pp .1–2

  5. L. Cheng, R. Peng, A. Li, C. Zheng, X. Li, Deep learning-based stereophonic acoustic echo suppression without decorrelation. J. Acoust. Soc. Am. 150(2), 816–829 (2021). https://doi.org/10.1121/10.0005757

    Article  Google Scholar 

  6. L. Cheng, C. Zheng, A. Li, Y. Wu, R. Peng, X. Li, A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2508–2512

  7. G. Enzner, P. Vary, Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones. Signal Process. 86(6), 1140–1156 (2006). https://doi.org/10.1016/j.sigpro.2005.09.013

    Article  Google Scholar 

  8. J. Franzen, T. Fingscheidt, Deep residual echo suppression and noise reduction: a multi-input FCRN approach in a hybrid speech enhancement system, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 666–670

  9. T. Haubner, M.M. Halimeh, A. Brendel, W. Kellermann, A synergistic Kalman- and deep postfiltering approach to acoustic echo cancellation, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021), pp. 990–994

  10. A. Ivry, I. Cohen, B. Berdugo, Objective metrics to evaluate residual-echo suppression during double-talk in the stereophonic case, in INTERSPEECH, Incheon, Korea (2022), pp. 5348–5352

  11. A. Ivry, I. Cohen, B. Berdugo, Deep adaptation control for stereophonic acoustic echo cancellation, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA (2023), pp. 1–5

  12. U. Mahbub, S.A. Fattah, A single-channel acoustic echo cancellation scheme using gradient-based adaptive filtering. Circuits Syst. Signal Process. 33, 1541–1572 (2014). https://doi.org/10.1007/s00034-013-9715-z

    Article  Google Scholar 

  13. S. Malik, G. Enzner, Recursive Bayesian control of multichannel acoustic echo cancellation. IEEE Signal Process. Lett. 18(11), 619–622 (2011). https://doi.org/10.1109/LSP.2011.2166385

    Article  Google Scholar 

  14. R. Nath, Adaptive echo cancellation based on a multipath model of acoustic channel. Circuits Syst. Signal Process. 32, 1673–1698 (2013). https://doi.org/10.1007/s00034-012-9529-4

    Article  Google Scholar 

  15. V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: an ASR corpus based on public domain audio books, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (South Brisbane, Queensland, Australia, 2015), pp. 5206–5210

  16. J. Park, J.-H. Chang, State-space microphone array nonlinear acoustic echo cancellation using multi-microphone near-end speech covariance. IEEE/ACM Trans. Audio Speech Lang. Process. 27(10), 1520–1534 (2019). https://doi.org/10.1109/TASLP.2019.2923969

    Article  Google Scholar 

  17. C.K.A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, INTERSPEECH 2021 deep noise suppression challenge, in INTERSPEECH, Brno, Czechia (2021), pp. 2796–2800

  18. A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Salt Lake City, UT, USA, 2001), pp. 749–752

  19. J.L. Roux, S. Wisdom, H. Erdogan, J.R. Hershey, SDR-half-baked or well done?, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Brighton, UK, 2019), pp. 626–630

  20. K. Tan, D. Wang, Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 380–390 (2020). https://doi.org/10.1109/TASLP.2019.2955276

    Article  Google Scholar 

  21. J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 7133–7137

  22. A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition II NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993). https://doi.org/10.1016/0167-6393(93)90095-3

    Article  Google Scholar 

  23. Z. Wang, Y. Na, Z. Liu, B. Tian, Q. Fu, Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 141–145

  24. D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for joint enhancement of magnitude and phase, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai (2016), pp. 5220–5224

  25. C. Wu, X. Wang, Y. Guo, Q. Fu, Y. Yan, Robust uncertainty control of the simplified Kalman filter for acoustic echo cancelation. Circuits Syst. Signal Process. 35, 4584–4595 (2016). https://doi.org/10.1007/s00034-016-0263-1

    Article  MathSciNet  Google Scholar 

  26. Z. Yan, F. Yang, J. Yang, Optimum step-size control for a variable step-size stereo acoustic echo canceller in the frequency domain. Speech Commun. 124, 21–27 (2020). https://doi.org/10.1016/j.specom.2020.08.004

    Article  Google Scholar 

  27. F. Yang, M. Wu, J. Yang, Stereophonic acoustic echo suppression based on Wiener filter in the short-time Fourier transform domain. IEEE Signal Process. Lett. 19(4), 227–230 (2012). https://doi.org/10.1109/LSP.2012.2187446

    Article  Google Scholar 

  28. F. Yang, G. Enzner, J. Yang, Frequency-domain adaptive Kalman filter with fast recovery of abrupt echo-path changes. IEEE Signal Process. Lett. 24(12), 1778–1782 (2017). https://doi.org/10.1109/LSP.2017.2718564

    Article  Google Scholar 

  29. C. Zhang, J. Liu, X. Zhang, LCSM: a lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2523–2527

  30. G. Zhang, L. Yu, C. Wang, J. Wei, Multi-scale temporal frequency convolutional network with axial attention for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9122–9126

  31. H. Zhang, D. Wang, Neural cascade architecture for multi-channel acoustic echo suppression. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1–11 (2022). https://doi.org/10.1109/TASLP.2022.3192104

    Article  Google Scholar 

  32. H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, T. Kristjansson, Deep adaptive AEC: hybrid of deep learning and adaptive acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 756–760

  33. S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, L. Xie, Multi-task deep residual echo suppression with echo-aware loss, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9127–9131

  34. Y. Zhang, M. Yu, H. Zhang, D. Yu, D. Wang, NeuralKalman: a learnable kalman filter for acoustic echo cancellation, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan (2023), pp. 1–7

Download references

Acknowledgements

This work was supported in part by Beijing Natural Science Foundation under Grant 4242013, in part by National Natural Science Foundation of China under Grant 62171438, and in part by IACAS Frontier Exploration Project QYTS202111.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Feiran Yang or Jun Yang.

Ethics declarations

Conflict of interest

There are no conflict of interest, according to the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Liu, S., Yang, F. et al. A Deep Hybrid Model for Stereophonic Acoustic Echo Control. Circuits Syst Signal Process 43, 8046–8059 (2024). https://doi.org/10.1007/s00034-024-02807-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-024-02807-x

Keywords