An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement

Li, Kehuang; Wu, Bo; Lee, Chin-Hui

doi:10.21437/Interspeech.2016-494

An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement

Kehuang Li, Bo Wu, Chin-Hui Lee

We propose an iterative phase recovery framework to improve spectral mapping with an application to improving the performance of state-of-the-art speech enhancement systems using magnitude-based spectral mapping with deep neural networks (DNNs). We further propose to use an estimated time-frequency mask to reduce sign uncertainty in the overlap-add waveform reconstruction algorithm. In a series of enhancement experiments using a DNN baseline system, by directly replacing the original phase of noisy speech with the estimated phase obtained with a classical phase recovery algorithm, the proposed iterative technique reduces the log-spectral distortion (LSD) by 0.41 dB from the DNN baseline, and increases the perceptual evaluation speech quality (PESQ) by 0.05 over the DNN baseline, averaging over a wide range of signal and noise conditions. The proposed phase mask mechanism further increases the segmental signal-to-noise ratio (SegSNR) by 0.44 dB at an expense of a slight degradation in LSD and PESQ comparing with the algorithm without using any phase mask.

doi: 10.21437/Interspeech.2016-494

Cite as: Li, K., Wu, B., Lee, C.-H. (2016) An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement. Proc. Interspeech 2016, 3773-3777, doi: 10.21437/Interspeech.2016-494

@inproceedings{li16o_interspeech,
  author={Kehuang Li and Bo Wu and Chin-Hui Lee},
  title={{An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3773--3777},
  doi={10.21437/Interspeech.2016-494}
}