Efficient Speech Enhancement with Neural Homomorphic Synthesis

Jiang, Wenbin; Liu, Tao; Yu, Kai

doi:10.21437/Interspeech.2022-10411

Efficient Speech Enhancement with Neural Homomorphic Synthesis

Wenbin Jiang, Tao Liu, Kai Yu

Most of the existing deep neural network based speech enhancement methods usually operate on short-time Fourier transform domain or alternatively learned features without employing the speech production model. In this work, we present an efficient speech enhancement algorithm using the speech source-filter model. Concretely, we separate the framed speech into excitation and vocal tract components by homomorphic filtering, adopt two convolutional recurrent networks for estimating the reference magnitude of the separated components, and synthesize the minimum phase signal with the estimated components. Lastly, the enhanced speech is obtained by a post-processing procedure, including using the noisy phase and overlap-addition. Experimental results demonstrated that the proposed method yields a comparable performance with the state-of-the-art complex-valued neural network based method. In addition, we conducted extensive experiments and found that the proposed method is more efficient with a compact model.

doi: 10.21437/Interspeech.2022-10411

Cite as: Jiang, W., Liu, T., Yu, K. (2022) Efficient Speech Enhancement with Neural Homomorphic Synthesis. Proc. Interspeech 2022, 986-990, doi: 10.21437/Interspeech.2022-10411

@inproceedings{jiang22b_interspeech,
  author={Wenbin Jiang and Tao Liu and Kai Yu},
  title={{Efficient Speech Enhancement with Neural Homomorphic Synthesis}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={986--990},
  doi={10.21437/Interspeech.2022-10411}
}