Average Modeling Approach to Voice Conversion with Non-Parallel Data

Tian, Xiaohai; Wang, Junchao; Xu, Haihua; Chng, Eng-Siong; Li, Haizhou

doi:10.21437/Odyssey.2018-32

Average Modeling Approach to Voice Conversion with Non-Parallel Data

Xiaohai Tian, Junchao Wang, Haihua Xu, Eng-Siong Chng, Haizhou Li

Voice conversion techniques typically require source-target parallel speech data for model training. Such parallel data may not be available always in practice. This paper presents a non-parallel data approach, that we call average modeling approach. The proposed approach makes use of a multi-speaker average model that maps speaker-independent linguistic features to speaker dependent acoustic features. In particular, we present two practical implementations, 1) to adapt the average model towards target speaker with a small amount of target data, 2) to present speaker identity as an additional input to the average model to generate target speech. As the linguistic feature and the acoustic feature can be extracted from the same utterance, the proposed approach doesn't require parallel data in either average model training or adaptation. We report the experiments on the voice conversion challenge 2018 (VCC2018) database that validate the effectiveness of the proposed method.

doi: 10.21437/Odyssey.2018-32

Cite as: Tian, X., Wang, J., Xu, H., Chng, E.-S., Li, H. (2018) Average Modeling Approach to Voice Conversion with Non-Parallel Data . Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 227-232, doi: 10.21437/Odyssey.2018-32

@inproceedings{tian18_odyssey,
  author={Xiaohai Tian and Junchao Wang and Haihua Xu and Eng-Siong Chng and Haizhou Li},
  title={{Average Modeling Approach to Voice Conversion with Non-Parallel Data	}},
  year=2018,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2018)},
  pages={227--232},
  doi={10.21437/Odyssey.2018-32}
}