Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis | IEEE Journals & Magazine | IEEE Xplore

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis


Abstract:

This letter presents a framework towards multi-accent neural text-to-speech synthesis for zero-shot multi-speaker, which employs an encoder-decoder architecture and an ac...Show More

Abstract:

This letter presents a framework towards multi-accent neural text-to-speech synthesis for zero-shot multi-speaker, which employs an encoder-decoder architecture and an accent classifier to control the pronunciation variation from the encoder. The encoder and decoder are pre-trained on a large-scale multi-speaker corpus. The accent-informed encoder outputs are taken by the attention-based decoder to generate accented prosody. This framework allows for fine-tuning with limited training data from multiple accents, and is able to generate accented speech for unseen speakers. Both objective and subjective evaluations confirm the effectiveness of the proposed framework.
Published in: IEEE Signal Processing Letters ( Volume: 30)
Page(s): 947 - 951
Date of Publication: 05 July 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.