Conferences >ICASSP 2021 - 2021 IEEE Inter...

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

End-to-end framework can generate high-quality and high-similarity speech in the personalized speech synthesis task. However, the generalization of out-of-domain texts is...Show More

Metadata

Abstract:

End-to-end framework can generate high-quality and high-similarity speech in the personalized speech synthesis task. However, the generalization of out-of-domain texts is still a challenging task. Limited target data leads to unacceptable errors and poor prosody and similarity performance of the synthetic speech. In this paper, we present a bi-level function decoupling framework to realise separate modeling and controlling for solving above problems. Firstly, on the style representation modeling level, compared with the conventional methods that use single embedding to model all the text dependent discrepancies, it is proposed that the speaker embedding and prosody embedding are modeled separately based on the reference audio and phonetic posteriorgram (PPG) by a multi-head attention mechanism. Secondly, on the model structure level, the decoder model structure is factored into average-net and adaptation-net, where the duration prosody controlling and speaker timbre imitation are mainly designed in relatively separate areas. Experimental results on Mandarin dataset show that the proposed methods lead to an improvement on both robustness, naturalness and similarity.

Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 June 2021

Date Added to IEEE Xplore: 13 May 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP39728.2021.9414422

Conference Location: Toronto, ON, Canada

Funding Agency:

Contents

References is not available for this document.

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?