A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement

Tan, Ke; Wang, DeLiang

doi:10.21437/Interspeech.2018-1405

A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement

Ke Tan, DeLiang Wang

Many real-world applications of speech enhancement, such as hearing aids and cochlear implants, desire real-time processing, with no or low latency. In this paper, we propose a novel convolutional recurrent network (CRN) to address real-time monaural speech enhancement. We incorporate a convolutional encoder-decoder (CED) and long short-term memory (LSTM) into the CRN architecture, which leads to a causal system that is naturally suitable for real-time processing. Moreover, the proposed model is noise- and speaker-independent, i.e. noise types and speakers can be different between training and test. Our experiments suggest that the CRN leads to consistently better objective intelligibility and perceptual quality than an existing LSTM based model. Moreover, the CRN has much fewer trainable parameters.

doi: 10.21437/Interspeech.2018-1405

Cite as: Tan, K., Wang, D. (2018) A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement. Proc. Interspeech 2018, 3229-3233, doi: 10.21437/Interspeech.2018-1405

@inproceedings{tan18_interspeech,
  author={Ke Tan and DeLiang Wang},
  title={{A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3229--3233},
  doi={10.21437/Interspeech.2018-1405}
}