Loading web-font TeX/Main/Regular
Phoneme Selective Speech Enhancement Using Parametric Estimators and the Mixture Maximum Model: A Unifying Approach | IEEE Journals & Magazine | IEEE Xplore

Phoneme Selective Speech Enhancement Using Parametric Estimators and the Mixture Maximum Model: A Unifying Approach


Abstract:

This study presents a ROVER speech enhancement algorithm that employs a series of prior enhanced utterances, each customized for a specific broad level phoneme class, to ...Show More

Abstract:

This study presents a ROVER speech enhancement algorithm that employs a series of prior enhanced utterances, each customized for a specific broad level phoneme class, to generate a single composite utterance which provides overall improved objective quality across all classes. The noisy utterance is first partitioned into speech and non-speech regions using a voice activity detector, followed by a mixture maximum (MIXMAX) model which is used to make probabilistic decisions in the speech regions to determine phoneme class weights. The prior enhanced utterances are weighted by these decisions and combined to form the final composite utterance. The enhancement system that generates the prior enhanced utterances comprises of a family of parametric gain functions whose parameters are flexible and can be varied to achieve high enhancement levels per phoneme class. These parametric gain functions are derived using 1) a weighted Euclidean distortion cost function, and 2) by modeling clean speech spectral magnitudes or discrete Fourier transform coefficients by Chi or two-sided Gamma priors, respectively. The special case estimators of these gain functions are the generalized spectral subtraction (GSS), minimum mean square error (MMSE), two-sided Gamma or joint maximum a posteriori (MAP) estimators. Performance evaluations performed over two noise types and signal-to-noise ratios (SNRs) ranging from {-} 5 dB to 10 dB suggest that the proposed ROVER algorithm not only outperforms the special case estimators but also the family of parametric estimators when all phoneme classes are jointly considered.
Published in: IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 20, Issue: 8, October 2012)
Page(s): 2265 - 2279
Date of Publication: 12 June 2012

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.