Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams

Sun, Lifa; Wang, Hao; Kang, Shiyin; Li, Kun; Meng, Helen

doi:10.21437/Interspeech.2016-1043

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams

Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen Meng

We present a novel approach that enables a target speaker (e.g. monolingual Chinese speaker) to speak a new language (e.g. English) based on arbitrary textual input. Our system includes a trained English speaker-independent automatic speech recognition (SI-ASR) engine using TIMIT. Given the target speaker’s speech in a non-target language, we generate Phonetic PosteriorGrams (PPGs) with the SI-ASR and then train a Deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks (DBLSTM) to model the relationships between the PPGs and the acoustic signal. Synthesis involves input of arbitrary text to a general TTS engine (trained on any non-target speaker), the output of which is indexed by SI-ASR as PPGs. These are used by the DBLSTM to synthesize the target language in the target speaker’s voice. A main advantage of this approach has very low training data requirement of the target speaker which can be in any language, as compared with a reference approach of training a special TTS engine using many recordings from the target speaker only in the target language. For a given target speaker, our proposed approach trained on 100 Mandarin (i.e. non-target language) utterances achieves comparable performance (in MOS and ABX test) of English synthetic speech as an HTS system trained on 1,000 English utterances.

doi: 10.21437/Interspeech.2016-1043

Cite as: Sun, L., Wang, H., Kang, S., Li, K., Meng, H. (2016) Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams. Proc. Interspeech 2016, 322-326, doi: 10.21437/Interspeech.2016-1043

@inproceedings{sun16_interspeech,
  author={Lifa Sun and Hao Wang and Shiyin Kang and Kun Li and Helen Meng},
  title={{Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={322--326},
  doi={10.21437/Interspeech.2016-1043}
}