Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement

Oostermeijer, Koen; Wang, Qing; Du, Jun

doi:10.21437/Interspeech.2021-668

Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement

Koen Oostermeijer, Qing Wang, Jun Du

In this paper, we describe a novel speech enhancement transformer architecture. The model uses local causal self-attention, which makes it lightweight and therefore particularly well-suited for real-time speech enhancement in computation resource-limited environments. In addition, we provide several ablation studies that focus on different parts of the model and the loss function to figure out which modifications yield best improvements. Using this knowledge, we propose a final version of our architecture, that we sent in to the INTERSPEECH 2021 DNS Challenge, where it achieved competitive results, despite using only 2% of the maximally allowed computation. Furthermore, we performed experiments to compare it with with LSTM and CNN models, that had 127% and 257% more parameters, respectively. Despite this difference in model size, we achieved significant improvements on the considered speech quality and intelligibility measures.

doi: 10.21437/Interspeech.2021-668

Cite as: Oostermeijer, K., Wang, Q., Du, J. (2021) Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement. Proc. Interspeech 2021, 2831-2835, doi: 10.21437/Interspeech.2021-668

@inproceedings{oostermeijer21_interspeech,
  author={Koen Oostermeijer and Qing Wang and Jun Du},
  title={{Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2831--2835},
  doi={10.21437/Interspeech.2021-668}
}