A Deep Hybrid Model for Stereophonic Acoustic Echo Control

Liu, Yang; Liu, Sichen; Yang, Feiran; Yang, Jun

doi:10.1007/s00034-024-02807-x

A Deep Hybrid Model for Stereophonic Acoustic Echo Control

Short Paper
Published: 07 August 2024

Volume 43, pages 8046–8059, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Yang Liu^1,4,
Sichen Liu²,
Feiran Yang ORCID: orcid.org/0000-0002-1734-3785³ &
…
Jun Yang^1,4

132 Accesses
Explore all metrics

Abstract

This paper proposes a deep hybrid model for stereophonic acoustic echo cancellation (SAEC). A two-stage model is considered, i.e., a deep-learning-based Kalman filter (DeepKalman) and a gated convolutional recurrent network (GCRN)-based postfilter, which are jointly trained in an end-to-end manner. The difference between the proposed DeepKalman filter and the conventional one is twofold. First, the input signal of the DeepKalman filter is a combination of the original two far-end signals and the nonlinear reference signal estimated from the microphone signal directly. Second, a low-complexity recurrent neural network is utilized to estimate the covariance of the process noise, which can enhance the tracking capability of the DeepKalman filter. In the second stage, we adopt GCRN to suppress residual echo and noise by estimating complex masks applied to the output signal of the first stage. Computer simulations confirm the performance advantage of the proposed method over existing SAEC algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear residual echo suppression based on dual-stream DPRNN

Article Open access 07 September 2021

A family of split kernel adaptive filtering algorithms for nonlinear stereophonic acoustic echo cancellation

Article 05 January 2022

Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules

Article 06 November 2023

Data Availability

The datasets generated during the current study are available from the corresponding author by reasonable request.

References

Y. Avargel, I. Cohen, System identification in the short-time Fourier transform domain with crossband filtering. IEEE Trans. Audio Speech Lang. Process. 15(4), 1305–1319 (2007). https://doi.org/10.1109/TASL.2006.889720
Article Google Scholar
J. Benesty, D.R. Morgan, M.M. Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Process. 6(2), 156–165 (1998). https://doi.org/10.1109/89.661474
Article Google Scholar
J. Benesty, T. Gänsler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, 1st edn. (Springer, Berlin, 2001)
Book Google Scholar
Z. Chen, X. Xia, S. Sun, Z. Wang, C. Chen, G. Xie, P. Zhang, Y. Xiao, A progressive neural network for acoustic echo cancellation, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Rhodes Island, Greece, 2023), pp .1–2
L. Cheng, R. Peng, A. Li, C. Zheng, X. Li, Deep learning-based stereophonic acoustic echo suppression without decorrelation. J. Acoust. Soc. Am. 150(2), 816–829 (2021). https://doi.org/10.1121/10.0005757
Article Google Scholar
L. Cheng, C. Zheng, A. Li, Y. Wu, R. Peng, X. Li, A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2508–2512
G. Enzner, P. Vary, Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones. Signal Process. 86(6), 1140–1156 (2006). https://doi.org/10.1016/j.sigpro.2005.09.013
Article Google Scholar
J. Franzen, T. Fingscheidt, Deep residual echo suppression and noise reduction: a multi-input FCRN approach in a hybrid speech enhancement system, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 666–670
T. Haubner, M.M. Halimeh, A. Brendel, W. Kellermann, A synergistic Kalman- and deep postfiltering approach to acoustic echo cancellation, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021), pp. 990–994
A. Ivry, I. Cohen, B. Berdugo, Objective metrics to evaluate residual-echo suppression during double-talk in the stereophonic case, in INTERSPEECH, Incheon, Korea (2022), pp. 5348–5352
A. Ivry, I. Cohen, B. Berdugo, Deep adaptation control for stereophonic acoustic echo cancellation, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA (2023), pp. 1–5
U. Mahbub, S.A. Fattah, A single-channel acoustic echo cancellation scheme using gradient-based adaptive filtering. Circuits Syst. Signal Process. 33, 1541–1572 (2014). https://doi.org/10.1007/s00034-013-9715-z
Article Google Scholar
S. Malik, G. Enzner, Recursive Bayesian control of multichannel acoustic echo cancellation. IEEE Signal Process. Lett. 18(11), 619–622 (2011). https://doi.org/10.1109/LSP.2011.2166385
Article Google Scholar
R. Nath, Adaptive echo cancellation based on a multipath model of acoustic channel. Circuits Syst. Signal Process. 32, 1673–1698 (2013). https://doi.org/10.1007/s00034-012-9529-4
Article Google Scholar
V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: an ASR corpus based on public domain audio books, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (South Brisbane, Queensland, Australia, 2015), pp. 5206–5210
J. Park, J.-H. Chang, State-space microphone array nonlinear acoustic echo cancellation using multi-microphone near-end speech covariance. IEEE/ACM Trans. Audio Speech Lang. Process. 27(10), 1520–1534 (2019). https://doi.org/10.1109/TASLP.2019.2923969
Article Google Scholar
C.K.A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, INTERSPEECH 2021 deep noise suppression challenge, in INTERSPEECH, Brno, Czechia (2021), pp. 2796–2800
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Salt Lake City, UT, USA, 2001), pp. 749–752
J.L. Roux, S. Wisdom, H. Erdogan, J.R. Hershey, SDR-half-baked or well done?, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Brighton, UK, 2019), pp. 626–630
K. Tan, D. Wang, Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 380–390 (2020). https://doi.org/10.1109/TASLP.2019.2955276
Article Google Scholar
J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 7133–7137
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition II NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993). https://doi.org/10.1016/0167-6393(93)90095-3
Article Google Scholar
Z. Wang, Y. Na, Z. Liu, B. Tian, Q. Fu, Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 141–145
D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for joint enhancement of magnitude and phase, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai (2016), pp. 5220–5224
C. Wu, X. Wang, Y. Guo, Q. Fu, Y. Yan, Robust uncertainty control of the simplified Kalman filter for acoustic echo cancelation. Circuits Syst. Signal Process. 35, 4584–4595 (2016). https://doi.org/10.1007/s00034-016-0263-1
Article MathSciNet Google Scholar
Z. Yan, F. Yang, J. Yang, Optimum step-size control for a variable step-size stereo acoustic echo canceller in the frequency domain. Speech Commun. 124, 21–27 (2020). https://doi.org/10.1016/j.specom.2020.08.004
Article Google Scholar
F. Yang, M. Wu, J. Yang, Stereophonic acoustic echo suppression based on Wiener filter in the short-time Fourier transform domain. IEEE Signal Process. Lett. 19(4), 227–230 (2012). https://doi.org/10.1109/LSP.2012.2187446
Article Google Scholar
F. Yang, G. Enzner, J. Yang, Frequency-domain adaptive Kalman filter with fast recovery of abrupt echo-path changes. IEEE Signal Process. Lett. 24(12), 1778–1782 (2017). https://doi.org/10.1109/LSP.2017.2718564
Article Google Scholar
C. Zhang, J. Liu, X. Zhang, LCSM: a lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2523–2527
G. Zhang, L. Yu, C. Wang, J. Wei, Multi-scale temporal frequency convolutional network with axial attention for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9122–9126
H. Zhang, D. Wang, Neural cascade architecture for multi-channel acoustic echo suppression. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1–11 (2022). https://doi.org/10.1109/TASLP.2022.3192104
Article Google Scholar
H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, T. Kristjansson, Deep adaptive AEC: hybrid of deep learning and adaptive acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 756–760
S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, L. Xie, Multi-task deep residual echo suppression with echo-aware loss, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9127–9131
Y. Zhang, M. Yu, H. Zhang, D. Yu, D. Wang, NeuralKalman: a learnable kalman filter for acoustic echo cancellation, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan (2023), pp. 1–7

Download references

Acknowledgements

This work was supported in part by Beijing Natural Science Foundation under Grant 4242013, in part by National Natural Science Foundation of China under Grant 62171438, and in part by IACAS Frontier Exploration Project QYTS202111.

Author information

Authors and Affiliations

Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China
Yang Liu & Jun Yang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Sichen Liu
State Key Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China
Feiran Yang
University of Chinese Academy of Sciences, Beijing, 100049, China
Yang Liu & Jun Yang

Authors

Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sichen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Feiran Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Feiran Yang or Jun Yang.

Ethics declarations

Conflict of interest

There are no conflict of interest, according to the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Liu, S., Yang, F. et al. A Deep Hybrid Model for Stereophonic Acoustic Echo Control. Circuits Syst Signal Process 43, 8046–8059 (2024). https://doi.org/10.1007/s00034-024-02807-x

Download citation

Received: 20 March 2024
Revised: 19 July 2024
Accepted: 22 July 2024
Published: 07 August 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s00034-024-02807-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Hybrid Model for Stereophonic Acoustic Echo Control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear residual echo suppression based on dual-stream DPRNN

A family of split kernel adaptive filtering algorithms for nonlinear stereophonic acoustic echo cancellation

Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Deep Hybrid Model for Stereophonic Acoustic Echo Control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear residual echo suppression based on dual-stream DPRNN

A family of split kernel adaptive filtering algorithms for nonlinear stereophonic acoustic echo cancellation

Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation