ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model

Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

This paper proposes a multi-band harmonic-plus-noise (HN) Parallel WaveGAN (PWG) vocoder. To generate a high-fidelity speech signal, it is important to well-reflect the harmonic-noise characteristics of the speech waveform in the time-frequency domain. However, it is difficult for the conventional PWG model to accurately match this condition, as its single generator inefficiently represents the complicated nature of harmonic-noise structures. In the proposed method, the HN WaveNet models are employed to overcome this limitation, which enable the separate generation of the harmonic and noise components of speech signals from the pitch-dependent sine wave and Gaussian noise sources, respectively. Then, the energy ratios between harmonic and noise components in multiple frequency bands (i.e., subband harmonicities) are predicted by an additional harmonicity estimator. Weighted by the estimated harmonicities, the gain of harmonic and noise components in each subband is adjusted, and finally mixed together to compose the full-band speech signal. Subjective evaluation results showed that the proposed method significantly improved the perceptual quality of the synthesized speech.


doi: 10.21437/Interspeech.2021-976

Cite as: Hwang, M.-J., Yamamoto, R., Song, E., Kim, J.-M. (2021) High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model. Proc. Interspeech 2021, 2227-2231, doi: 10.21437/Interspeech.2021-976

@inproceedings{hwang21_interspeech,
  author={Min-Jae Hwang and Ryuichi Yamamoto and Eunwoo Song and Jae-Min Kim},
  title={{High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2227--2231},
  doi={10.21437/Interspeech.2021-976}
}