Speech enhancement is a fundamental way to separate and generate clean speech from adverse environment where the received speech is seriously corrupted by noise. This paper applies a novel progressive network for speech enhancement by using multi-stage structure, where each stage contains a channel attention block followed by dilated encoder-decoder convolutional network with gated linear units. In addition, each stage generates a prediction that is refined by a supervised attention block. What is more, a fusion block is inserted between original inputs and outputs of previous stage. Multi-stage architecture is introduced to sequentially invoke multiple deep-learning networks, and its key ingredient is the information exchange between different stages. Thus, a more flexible and robust outputs can be generated. Experimental results show that the proposed architecture obtains consistently better performance than recent state-of-the-art models in terms of both PESQ and STOI scores.
Cite as: Xu, X., Wang, Y., Xu, D., Peng, Y., Zhang, C., Jia, J., Chen, B. (2021) Multi-Stage Progressive Speech Enhancement Network. Proc. Interspeech 2021, 2691-2695, doi: 10.21437/Interspeech.2021-520
@inproceedings{xu21g_interspeech, author={Xinmeng Xu and Yang Wang and Dongxiang Xu and Yiyuan Peng and Cong Zhang and Jie Jia and Binbin Chen}, title={{Multi-Stage Progressive Speech Enhancement Network}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={2691--2695}, doi={10.21437/Interspeech.2021-520} }