ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Real-Time Multi-Channel Speech Enhancement Based on Neural Network Masking with Attention Model

Cheng Xue, Weilong Huang, Weiguang Chen, Jinwei Feng

In this paper, we propose a real-time multi-channel speech enhancement method for noise reduction and dereverberation in far-field environments. The proposed method consists of two components: differential beamforming and mask estimation network. The differential beamforming is employed to suppress the interference signals from non-target directions such that a relatively clean speech can be obtained. The mask estimation network with an attention model is developed to capture the signal correlation among different channels in the feature extraction stage and enhance the feature representation that needs to be reconstructed into the target speech in the estimation mask stage. In the inference phase, the spectrum after differential beamforming is filtered by the estimated mask to obtain the final output. The spectrum after differential beamforming can provide a higher signal-to-noise ratio (SNR) than the original spectrum, so the estimated mask can more easily filter out the noise. We conducted experiments on the ConferencingSpeech2021 challenge (INTERSPEECH 2021) dataset to evaluate the proposed method. With only 2.9M parameters, the proposed method achieved competitive performance.


doi: 10.21437/Interspeech.2021-2266

Cite as: Xue, C., Huang, W., Chen, W., Feng, J. (2021) Real-Time Multi-Channel Speech Enhancement Based on Neural Network Masking with Attention Model. Proc. Interspeech 2021, 1862-1866, doi: 10.21437/Interspeech.2021-2266

@inproceedings{xue21_interspeech,
  author={Cheng Xue and Weilong Huang and Weiguang Chen and Jinwei Feng},
  title={{Real-Time Multi-Channel Speech Enhancement Based on Neural Network Masking with Attention Model}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1862--1866},
  doi={10.21437/Interspeech.2021-2266}
}