Abstract
While developing lexical resources for a particular language variety (Viennese), we experimented with a set of 5 different phonetic encodings, termed phone sets, used for unit selection speech synthesis. We started with a very rich phone set based on phonological considerations and covering as much phonetic variability as possible, which was then reduced to smaller sets by applying transformation rules that map or merge phone symbols. The optimal trade-off was found measuring the phone error rates of automatically learnt grapheme-to-phone rules and by a perceptual evaluation of 27 representative synthesized sentences. Further, we describe a method to semi-automatically enlarge the lexical resources for the target language variety using a lexicon base for Standard Austrian German.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Clark, R., Richmond, K., King, S.: Multisyn voices from ARCTIC data for the Blizzard challenge. In: Proc. Interspeech (2007)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech, September 1999, pp. 2347–2350 (1999)
Aylett, M.P., King, S.: Single speaker segmentation and inventory selection using dynamic time warping, self organization, and joint multigram mapping. In: 6th ISCA Speech Synthesis Workshop, Bonn, Germany (2007)
Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proc. Interspeech 2008, Brisbane, Australia (2008)
Moosmüller, S.: Soziophonologische Variation im gegenwärtigen Wiener Deutsch. Franz Steiner Verlag, Stuttgart (1987)
Artmann, H.C.: Sämtliche Gedichte. Jung und Jung, Salzburg und Wien (2003)
Damper, R.: Personal communication (June 2008)
Davel, M., Barnard, E.: Pronunciation prediction with Default & Refine. Computer Speech and Language 22(4) (2008)
Damper, R., Stanbridge, C., Marchard, Y.: A Pronunciation-by-Analogy Module for the Festival Text-to-Speech Synthesiser. In: SSW4 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pucher, M., Neubarth, F., Strom, V. (2010). Optimizing Phonetic Encoding for Viennese Unit Selection Speech Synthesis. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-12397-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)