Abstract
This paper proposes a deep hybrid model for stereophonic acoustic echo cancellation (SAEC). A two-stage model is considered, i.e., a deep-learning-based Kalman filter (DeepKalman) and a gated convolutional recurrent network (GCRN)-based postfilter, which are jointly trained in an end-to-end manner. The difference between the proposed DeepKalman filter and the conventional one is twofold. First, the input signal of the DeepKalman filter is a combination of the original two far-end signals and the nonlinear reference signal estimated from the microphone signal directly. Second, a low-complexity recurrent neural network is utilized to estimate the covariance of the process noise, which can enhance the tracking capability of the DeepKalman filter. In the second stage, we adopt GCRN to suppress residual echo and noise by estimating complex masks applied to the output signal of the first stage. Computer simulations confirm the performance advantage of the proposed method over existing SAEC algorithms.
Similar content being viewed by others
Data Availability
The datasets generated during the current study are available from the corresponding author by reasonable request.
References
Y. Avargel, I. Cohen, System identification in the short-time Fourier transform domain with crossband filtering. IEEE Trans. Audio Speech Lang. Process. 15(4), 1305–1319 (2007). https://doi.org/10.1109/TASL.2006.889720
J. Benesty, D.R. Morgan, M.M. Sondhi, A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation. IEEE Trans. Speech Audio Process. 6(2), 156–165 (1998). https://doi.org/10.1109/89.661474
J. Benesty, T. Gänsler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, 1st edn. (Springer, Berlin, 2001)
Z. Chen, X. Xia, S. Sun, Z. Wang, C. Chen, G. Xie, P. Zhang, Y. Xiao, A progressive neural network for acoustic echo cancellation, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Rhodes Island, Greece, 2023), pp .1–2
L. Cheng, R. Peng, A. Li, C. Zheng, X. Li, Deep learning-based stereophonic acoustic echo suppression without decorrelation. J. Acoust. Soc. Am. 150(2), 816–829 (2021). https://doi.org/10.1121/10.0005757
L. Cheng, C. Zheng, A. Li, Y. Wu, R. Peng, X. Li, A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2508–2512
G. Enzner, P. Vary, Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones. Signal Process. 86(6), 1140–1156 (2006). https://doi.org/10.1016/j.sigpro.2005.09.013
J. Franzen, T. Fingscheidt, Deep residual echo suppression and noise reduction: a multi-input FCRN approach in a hybrid speech enhancement system, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 666–670
T. Haubner, M.M. Halimeh, A. Brendel, W. Kellermann, A synergistic Kalman- and deep postfiltering approach to acoustic echo cancellation, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021), pp. 990–994
A. Ivry, I. Cohen, B. Berdugo, Objective metrics to evaluate residual-echo suppression during double-talk in the stereophonic case, in INTERSPEECH, Incheon, Korea (2022), pp. 5348–5352
A. Ivry, I. Cohen, B. Berdugo, Deep adaptation control for stereophonic acoustic echo cancellation, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA (2023), pp. 1–5
U. Mahbub, S.A. Fattah, A single-channel acoustic echo cancellation scheme using gradient-based adaptive filtering. Circuits Syst. Signal Process. 33, 1541–1572 (2014). https://doi.org/10.1007/s00034-013-9715-z
S. Malik, G. Enzner, Recursive Bayesian control of multichannel acoustic echo cancellation. IEEE Signal Process. Lett. 18(11), 619–622 (2011). https://doi.org/10.1109/LSP.2011.2166385
R. Nath, Adaptive echo cancellation based on a multipath model of acoustic channel. Circuits Syst. Signal Process. 32, 1673–1698 (2013). https://doi.org/10.1007/s00034-012-9529-4
V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: an ASR corpus based on public domain audio books, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (South Brisbane, Queensland, Australia, 2015), pp. 5206–5210
J. Park, J.-H. Chang, State-space microphone array nonlinear acoustic echo cancellation using multi-microphone near-end speech covariance. IEEE/ACM Trans. Audio Speech Lang. Process. 27(10), 1520–1534 (2019). https://doi.org/10.1109/TASLP.2019.2923969
C.K.A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, INTERSPEECH 2021 deep noise suppression challenge, in INTERSPEECH, Brno, Czechia (2021), pp. 2796–2800
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Salt Lake City, UT, USA, 2001), pp. 749–752
J.L. Roux, S. Wisdom, H. Erdogan, J.R. Hershey, SDR-half-baked or well done?, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Brighton, UK, 2019), pp. 626–630
K. Tan, D. Wang, Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 380–390 (2020). https://doi.org/10.1109/TASLP.2019.2955276
J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 7133–7137
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition II NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993). https://doi.org/10.1016/0167-6393(93)90095-3
Z. Wang, Y. Na, Z. Liu, B. Tian, Q. Fu, Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 141–145
D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for joint enhancement of magnitude and phase, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai (2016), pp. 5220–5224
C. Wu, X. Wang, Y. Guo, Q. Fu, Y. Yan, Robust uncertainty control of the simplified Kalman filter for acoustic echo cancelation. Circuits Syst. Signal Process. 35, 4584–4595 (2016). https://doi.org/10.1007/s00034-016-0263-1
Z. Yan, F. Yang, J. Yang, Optimum step-size control for a variable step-size stereo acoustic echo canceller in the frequency domain. Speech Commun. 124, 21–27 (2020). https://doi.org/10.1016/j.specom.2020.08.004
F. Yang, M. Wu, J. Yang, Stereophonic acoustic echo suppression based on Wiener filter in the short-time Fourier transform domain. IEEE Signal Process. Lett. 19(4), 227–230 (2012). https://doi.org/10.1109/LSP.2012.2187446
F. Yang, G. Enzner, J. Yang, Frequency-domain adaptive Kalman filter with fast recovery of abrupt echo-path changes. IEEE Signal Process. Lett. 24(12), 1778–1782 (2017). https://doi.org/10.1109/LSP.2017.2718564
C. Zhang, J. Liu, X. Zhang, LCSM: a lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2523–2527
G. Zhang, L. Yu, C. Wang, J. Wei, Multi-scale temporal frequency convolutional network with axial attention for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9122–9126
H. Zhang, D. Wang, Neural cascade architecture for multi-channel acoustic echo suppression. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1–11 (2022). https://doi.org/10.1109/TASLP.2022.3192104
H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, T. Kristjansson, Deep adaptive AEC: hybrid of deep learning and adaptive acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 756–760
S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, L. Xie, Multi-task deep residual echo suppression with echo-aware loss, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9127–9131
Y. Zhang, M. Yu, H. Zhang, D. Yu, D. Wang, NeuralKalman: a learnable kalman filter for acoustic echo cancellation, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan (2023), pp. 1–7
Acknowledgements
This work was supported in part by Beijing Natural Science Foundation under Grant 4242013, in part by National Natural Science Foundation of China under Grant 62171438, and in part by IACAS Frontier Exploration Project QYTS202111.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
There are no conflict of interest, according to the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Liu, S., Yang, F. et al. A Deep Hybrid Model for Stereophonic Acoustic Echo Control. Circuits Syst Signal Process 43, 8046–8059 (2024). https://doi.org/10.1007/s00034-024-02807-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-024-02807-x