Abstract:
Emotional voice conversion (EVC) transforms the emotional state of speech while preserving linguistic content and speaker identity. Although sequence-to-sequence models h...Show MoreMetadata
Abstract:
Emotional voice conversion (EVC) transforms the emotional state of speech while preserving linguistic content and speaker identity. Although sequence-to-sequence models have achieved significant success with EVC for handling limited non-parallel data, they lack mechanisms for automatically evaluating and improving their own performance, limiting their potential. Unlike humans who can hear and self-assess their own voices for better emotional expression, the existing EVC systems cannot evaluate the success of their con-versions or improve themselves accordingly. To address this gap, we propose a novel feedback-driven self-improvement mechanism within the EVC framework. This mechanism allows a system to assess its performance and iteratively refine its outputs. We further enhance performance by introducing an emotion-aware vocoder and a differentiable prosodic predictor. Our objective and subjective evaluations, conducted using non-parallel and limited emotional datasets, demonstrate that this innovative framework outperforms existing state-of-the-art approaches in terms of emotion expression, audio quality, and inference speed.
Date of Conference: 17-19 October 2024
Date Added to IEEE Xplore: 20 December 2024
ISBN Information: