Conferences >ICASSP 2022 - 2022 IEEE Inter...

Controllable Speech Representation Learning Via Voice Conversion and AIC Loss

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Speech representation learning transforms speech into features that are suitable for downstream tasks, e.g. speech recognition, phoneme classification, or speaker identif...Show More

Metadata

Abstract:

Speech representation learning transforms speech into features that are suitable for downstream tasks, e.g. speech recognition, phoneme classification, or speaker identification. For such recognition tasks, a representation can be lossy (non-invertible), which is typical of BERT-like self-supervised models. However, when used for synthesis tasks, we find these lossy representations prove to be insufficient to plausibly reconstruct the input signal. This paper introduces a method for invertible and controllable speech representation learning based on disentanglement. The representation can be decoded into a signal perceptually identical to the original. Moreover, its disentangled components (content, pitch, speaker identity, and energy) can be controlled independently to alter the synthesis result. Our model builds upon a zero-shot voice conversion model AutoVC-F0, in which we introduce alteration invariant content loss (AIC loss) and adversarial training (GAN). Through objective measures and subjective tests, we show that our formulation offers significant improvement in voice conversion sound quality as well as more precise control over the disentangled features.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747590

Conference Location: Singapore, Singapore

Contents

References is not available for this document.

Controllable Speech Representation Learning Via Voice Conversion and AIC Loss

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Controllable Speech Representation Learning Via Voice Conversion and AIC Loss

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?