Neural Homomorphic Vocoder

Liu, Zhijun; Chen, Kuan; Yu, Kai

doi:10.21437/Interspeech.2020-3188

Neural Homomorphic Vocoder

Zhijun Liu, Kuan Chen, Kai Yu

In this paper, we propose the neural homomorphic vocoder (NHV), a source-filter model based neural vocoder framework. NHV synthesizes speech by filtering impulse trains and noise with linear time-varying (LTV) filters. A neural network controls the LTV filters by estimating complex cepstrums of time-varying impulse responses given acoustic features. The proposed framework can be trained with a combination of multi-resolution STFT loss and adversarial loss functions. Due to the use of DSP-based synthesis methods, NHV is highly efficient, fully controllable and interpretable. A vocoder was built under the framework to synthesize speech given log-Mel spectrograms and fundamental frequencies. While the model cost only 15 kFLOPs per sample, the synthesis quality remained comparable to baseline neural vocoders in both copy-synthesis and text-to-speech.

doi: 10.21437/Interspeech.2020-3188

Cite as: Liu, Z., Chen, K., Yu, K. (2020) Neural Homomorphic Vocoder. Proc. Interspeech 2020, 240-244, doi: 10.21437/Interspeech.2020-3188

@inproceedings{liu20_interspeech,
  author={Zhijun Liu and Kuan Chen and Kai Yu},
  title={{Neural Homomorphic Vocoder}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={240--244},
  doi={10.21437/Interspeech.2020-3188}
}