Journals & Magazines >IEEE Signal Processing Letters >Volume: 30

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This letter presents a framework towards multi-accent neural text-to-speech synthesis for zero-shot multi-speaker, which employs an encoder-decoder architecture and an ac...Show More

Metadata

Abstract:

This letter presents a framework towards multi-accent neural text-to-speech synthesis for zero-shot multi-speaker, which employs an encoder-decoder architecture and an accent classifier to control the pronunciation variation from the encoder. The encoder and decoder are pre-trained on a large-scale multi-speaker corpus. The accent-informed encoder outputs are taken by the attention-based decoder to generate accented prosody. This framework allows for fine-tuning with limited training data from multiple accents, and is able to generate accented speech for unseen speakers. Both objective and subjective evaluations confirm the effectiveness of the proposed framework.

Published in: IEEE Signal Processing Letters ( Volume: 30)

Page(s): 947 - 951

Date of Publication: 05 July 2023

ISSN Information:

DOI: 10.1109/LSP.2023.3292740

Funding Agency:

Contents

References is not available for this document.

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?