Previous studies have shown that a specialized speech enhancement model can outperform a general model when the test condition is matched to the training condition. Therefore, choosing the correct (matched) candidate model from a set of ensemble models is critical to achieve generalizability. Although the best decision criterion should be based directly on the evaluation metric, the need for a clean reference makes it impractical for employment. In this paper, we propose a novel specialized speech enhancement model selection (SSEMS) approach that applies a non-intrusive quality estimation model, termed Quality-Net, to solve this problem. Experimental results first confirm the effectiveness of the proposed SSEMS approach. Moreover, we observe that the correctness of Quality-Net in choosing the most suitable model increases as input noisy SNR increases, and thus the results of the proposed systems outperform another auto-encoder-based model selection and a general model, particularly under high SNR conditions.
Cite as: Zezario, R.E., Fu, S.-W., Lu, X., Wang, H.-M., Tsao, Y. (2019) Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric. Proc. Interspeech 2019, 3168-3172, doi: 10.21437/Interspeech.2019-2425
@inproceedings{zezario19_interspeech, author={Ryandhimas E. Zezario and Szu-Wei Fu and Xugang Lu and Hsin-Min Wang and Yu Tsao}, title={{Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={3168--3172}, doi={10.21437/Interspeech.2019-2425}, issn={2308-457X} }