ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation

Yi Zhou, Xiaohai Tian, Zhizheng Wu, Haizhou Li

Cross-Lingual Voice Conversion (XVC) aims to modify a source speaker identity towards a target while preserving the source linguistic content. This paper introduces a cycle consistency loss on linguistic representation to ensure the speech content unchanged after conversion. The proposed XVC model consists of two loss functions during optimization: a spectral reconstruction loss and a linguistic cycle consistency loss. The cycle consistency loss seeks to maintain the source speech’s linguistic content. Specifically, we utilize Phonetic PosteriorGram (PPG) to represent the linguistic content. XVC experiments were conducted between English and Mandarin. Both objective and subjective evaluations demonstrated that with the proposed cycle consistency loss, converted speech is more intelligible.


doi: 10.21437/Interspeech.2021-687

Cite as: Zhou, Y., Tian, X., Wu, Z., Li, H. (2021) Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation. Proc. Interspeech 2021, 1374-1378, doi: 10.21437/Interspeech.2021-687

@inproceedings{zhou21c_interspeech,
  author={Yi Zhou and Xiaohai Tian and Zhizheng Wu and Haizhou Li},
  title={{Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1374--1378},
  doi={10.21437/Interspeech.2021-687}
}