This study investigates deep learning based signal-to-noise ratio (SNR) estimation at the frame level. We propose to employ recurrent neural networks (RNNs) with long short-term memory (LSTM) in order to leverage contextual information for this task. As acoustic features are important for deep learning algorithms, we also examine a variety of monaural features and investigate feature combinations using Group Lasso and sequential floating forward selection. By replacing LSTM with bidirectional LSTM, the proposed algorithm naturally leads to a long-term SNR estimator. Systematical evaluations demonstrate that the proposed SNR estimators significantly outperform other frame-level and long-term SNR estimators.
Cite as: Li, H., Wang, D., Zhang, X., Gao, G. (2020) Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning. Proc. Interspeech 2020, 4626-4630, doi: 10.21437/Interspeech.2020-2475
@inproceedings{li20la_interspeech, author={Hao Li and DeLiang Wang and Xueliang Zhang and Guanglai Gao}, title={{Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning}}, year=2020, booktitle={Proc. Interspeech 2020}, pages={4626--4630}, doi={10.21437/Interspeech.2020-2475} }