Capturing the contextual information in multi-scale is known to be beneficial for improving the performance of DNN-based speech enhancement (SE) models. This paper proposes a new SE model, called NUNet-TLS, having two-level skip connections between the residual U-Blocks nested in each layer of a large U-Net structure. The proposed model also has a causal time-frequency attention (CFTA) at the output of the residual U-Block to boost dynamic representation of the speech context in multi-scale. Even having the two-level skip connections, the proposed model slightly increases the network parameters, but the performance improvement is significant. Experimental results show that the proposed NUNet-TLS has superior performance in various objective evaluation metrics to other state-of-the-art models. The code of our model is available at https://github.com/seorim0/NUNet-TLS
Cite as: Hwang, S., Park, S.W., Park, Y. (2022) Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections. Proc. Interspeech 2022, 191-195, doi: 10.21437/Interspeech.2022-10025
@inproceedings{hwang22b_interspeech, author={Seorim Hwang and Sung Wook Park and Youngcheol Park}, title={{Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={191--195}, doi={10.21437/Interspeech.2022-10025} }