Abstract:
While speech enhancement has critically required the estimation of local time-varying SNR, it was recently shown that SNR can be marginalized in a Bayesian sense from the...Show MoreMetadata
Abstract:
While speech enhancement has critically required the estimation of local time-varying SNR, it was recently shown that SNR can be marginalized in a Bayesian sense from the minimum-mean-square-error (MMSE) solution. Precisely, the local SNR is introduced as a stochastic variable and Bayesian integration can be approximately realized under consideration of a hyperprior distribution. In our paper, the proposed approach then takes the multimodal nature of the involved posterior distribution into account for speech inference. Specifically, the extrema of the posterior distribution, which can easily be obtained via differentiation, are combined according to their widths, heights and abscissa. The corresponding solution is not closed form, however, it is found within few iterations. This approach delivers a spectral weighting of noisy speech that simultaneously maximizes instrumental criteria of speech quality, specifically the segmental SNR, STOI score and PESQ.
Published in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information: