Conferences >2023 RIVF International Confe...

Zero-Shot Voice Conversion Based on Speaker Embedding Domain Generalization

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, a zero-shot voice conversion frame-work is constructed by effectively decoupling the semantic and speaker features in speech. The proposed method is based ...Show More

Metadata

Abstract:

In this paper, a zero-shot voice conversion frame-work is constructed by effectively decoupling the semantic and speaker features in speech. The proposed method is based on the pre-trained wav2vec 2.0 model to extract semantic features from source speakers and a WavLM model to extract speaker features from target speakers. We propose the Robust-MAML model to map the speaker feature of the target speaker into a domain generalization space, making it directly applicable to any unregistered speaker domain. Finally, through transfer learning, the speech synthesis model FastSpeech2 integrates the semantic feature and domain-generalized speaker features to synthesize the target speaker's voice. Experimental results show that the proposed method outperforms the common baseline systems in both naturalness and speaker similarity.

Published in: 2023 RIVF International Conference on Computing and Communication Technologies (RIVF)

Date of Conference: 23-25 December 2023

Date Added to IEEE Xplore: 22 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/RIVF60135.2023.10471830

Conference Location: Hanoi, Vietnam

Contents

References is not available for this document.

Zero-Shot Voice Conversion Based on Speaker Embedding Domain Generalization

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Zero-Shot Voice Conversion Based on Speaker Embedding Domain Generalization

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?