Abstract:
Formal parameters of speech prosody are investigated concerning their ability to estimate the perceptual quality of text-to-speech (TTS) signals. The study is carried out...Show MoreMetadata
Abstract:
Formal parameters of speech prosody are investigated concerning their ability to estimate the perceptual quality of text-to-speech (TTS) signals. The study is carried out for the German language using a broad databasis comprising a wide range of TTS systems and text materials. 18 purely acoustic markers, derived from Fo and vocalic/consonantal durations, are analysed individually and in conjunction via cross-validated regression models. The Fo slope within voiced segments proves particularly useful when integrated in a nonlinear fashion, whereas measures of durational variation perform comparably weak. The results highlight a strong potential for instrumental estimation techniques of TTS quality.
Published in: IEEE Signal Processing Letters ( Volume: 19, Issue: 5, May 2012)