Abstract:
Non-parallel voice conversion is an important but challenging task due to the lack of parallel data. Recently, Cycle-Consistent Generative Adversarial Networks based voic...Show MoreMetadata
Abstract:
Non-parallel voice conversion is an important but challenging task due to the lack of parallel data. Recently, Cycle-Consistent Generative Adversarial Networks based voice conversion (CycleGAN-VC) has achieved great success in non-parallel voice conversion. However, due to the instability of GAN training, there is still a large gap between the real target speech and the converted speech. To reduce the gap, in this paper, CycleGAN-VC-GP is proposed which incorporates two new techniques: (1) zero-centered gradient penalties are used to ensure the convergence of GAN; (2) the fundamental frequency is combined with the spectrum to improve prosody conversion. Both subjective and objective evaluation experiments showed that the proposed method obtained higher speaker similarity and comparable speech quality compared with the state-of-art method based on CycleGAN-VC.
Date of Conference: 28-31 October 2020
Date Added to IEEE Xplore: 24 December 2020
ISBN Information: