Abstract:
We address the problem of synthesizing narrow-focus word-stress in TTS by controlling the different prosodic features, namely, duration, F0 and intensity profiles. Toward...Show MoreMetadata
Abstract:
We address the problem of synthesizing narrow-focus word-stress in TTS by controlling the different prosodic features, namely, duration, F0 and intensity profiles. Towards this, we perform a prosody modification of a neutral carrier sentence to create narrow-focus stress on specific target words, by controlling the three prosodic features at word-level and syllable-level. The specific control of the three prosodics is realized in several ways, such as by i) a transplantation from ground truth of stressed words elicited to create a narrow-focus on the specified target words, ii) control of the F0 contour through Fujisaki accent commands (extracted prior to the prosody modification), and iii) an automatic rule-driven modification of the three prosodics of the neutral speech, as derived from the above two methods. The effectiveness of the different prosodic feature to realize the target word-stress, termed prosodic-differential here, is measured by ABX listening tests yielding both categorical decisions and graded responses to determine how effective a particular prosodic feature is or specific combinations are for realizing the narrow-focus word-stress. This work is expected to yield fine control on automatic modification of prosodic features to synthesize narrow-focus word-stress in a TTS system.
Date of Conference: 02-04 March 2017
Date Added to IEEE Xplore: 23 October 2017
ISBN Information: