Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections

Hwang, Seorim; Park, Sung Wook; Park, Youngcheol

doi:10.21437/Interspeech.2022-10025

Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections

Seorim Hwang, Sung Wook Park, Youngcheol Park

Capturing the contextual information in multi-scale is known to be beneficial for improving the performance of DNN-based speech enhancement (SE) models. This paper proposes a new SE model, called NUNet-TLS, having two-level skip connections between the residual U-Blocks nested in each layer of a large U-Net structure. The proposed model also has a causal time-frequency attention (CFTA) at the output of the residual U-Block to boost dynamic representation of the speech context in multi-scale. Even having the two-level skip connections, the proposed model slightly increases the network parameters, but the performance improvement is significant. Experimental results show that the proposed NUNet-TLS has superior performance in various objective evaluation metrics to other state-of-the-art models. The code of our model is available at https://github.com/seorim0/NUNet-TLS

doi: 10.21437/Interspeech.2022-10025

Cite as: Hwang, S., Park, S.W., Park, Y. (2022) Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections. Proc. Interspeech 2022, 191-195, doi: 10.21437/Interspeech.2022-10025

@inproceedings{hwang22b_interspeech,
  author={Seorim Hwang and Sung Wook Park and Youngcheol Park},
  title={{Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={191--195},
  doi={10.21437/Interspeech.2022-10025}
}