Multi-Stage Progressive Speech Enhancement Network

Xu, Xinmeng; Wang, Yang; Xu, Dongxiang; Peng, Yiyuan; Zhang, Cong; Jia, Jie; Chen, Binbin

doi:10.21437/Interspeech.2021-520

Multi-Stage Progressive Speech Enhancement Network

Xinmeng Xu, Yang Wang, Dongxiang Xu, Yiyuan Peng, Cong Zhang, Jie Jia, Binbin Chen

Speech enhancement is a fundamental way to separate and generate clean speech from adverse environment where the received speech is seriously corrupted by noise. This paper applies a novel progressive network for speech enhancement by using multi-stage structure, where each stage contains a channel attention block followed by dilated encoder-decoder convolutional network with gated linear units. In addition, each stage generates a prediction that is refined by a supervised attention block. What is more, a fusion block is inserted between original inputs and outputs of previous stage. Multi-stage architecture is introduced to sequentially invoke multiple deep-learning networks, and its key ingredient is the information exchange between different stages. Thus, a more flexible and robust outputs can be generated. Experimental results show that the proposed architecture obtains consistently better performance than recent state-of-the-art models in terms of both PESQ and STOI scores.

doi: 10.21437/Interspeech.2021-520

Cite as: Xu, X., Wang, Y., Xu, D., Peng, Y., Zhang, C., Jia, J., Chen, B. (2021) Multi-Stage Progressive Speech Enhancement Network. Proc. Interspeech 2021, 2691-2695, doi: 10.21437/Interspeech.2021-520

@inproceedings{xu21g_interspeech,
  author={Xinmeng Xu and Yang Wang and Dongxiang Xu and Yiyuan Peng and Cong Zhang and Jie Jia and Binbin Chen},
  title={{Multi-Stage Progressive Speech Enhancement Network}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2691--2695},
  doi={10.21437/Interspeech.2021-520}
}