ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement

Xinlei Ren, Xu Zhang, Lianwu Chen, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu

People are meeting through video conferencing more often. While single channel speech enhancement techniques are useful for the individual participants, the speech quality will be significantly degraded in large meeting rooms where the far-field and reverberate conditions are introduced. Approaches based on microphone array signal processing are proposed to explore the inter-channel correlation among the individual microphone channels. In this work, a new causal U-net based multiple-in-multiple-out structure is proposed for real-time multi-channel speech enhancement. The proposed method incorporates the traditional beamforming structure with the multi-channel causal U-net by explicitly adding a beamforming operation at the end of the neural beamformer. The proposed method has entered the INTERSPEECH Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing. With 1.97M model parameters and 0.25 real-time factor on Intel Core i7 (2.6GHz) CPU, the proposed method has outperforms the baseline system of this challenge on PESQ, Si-SNR and STOI metrics.


doi: 10.21437/Interspeech.2021-1457

Cite as: Ren, X., Zhang, X., Chen, L., Zheng, X., Zhang, C., Guo, L., Yu, B. (2021) A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement. Proc. Interspeech 2021, 1832-1836, doi: 10.21437/Interspeech.2021-1457

@inproceedings{ren21_interspeech,
  author={Xinlei Ren and Xu Zhang and Lianwu Chen and Xiguang Zheng and Chen Zhang and Liang Guo and Bing Yu},
  title={{A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1832--1836},
  doi={10.21437/Interspeech.2021-1457}
}