Optimizing Phonetic Encoding for Viennese Unit Selection Speech Synthesis

Pucher, Michael; Neubarth, Friedrich; Strom, Volker

doi:10.1007/978-3-642-12397-9_17

Michael Pucher²⁰,
Friedrich Neubarth²¹ &
Volker Strom²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

2459 Accesses

Abstract

While developing lexical resources for a particular language variety (Viennese), we experimented with a set of 5 different phonetic encodings, termed phone sets, used for unit selection speech synthesis. We started with a very rich phone set based on phonological considerations and covering as much phonetic variability as possible, which was then reduced to smaller sets by applying transformation rules that map or merge phone symbols. The optimal trade-off was found measuring the phone error rates of automatically learnt grapheme-to-phone rules and by a perceptual evaluation of 27 representative synthesized sentences. Further, we describe a method to semi-automatically enlarge the lexical resources for the target language variety using a lexicon base for Standard Austrian German.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Phone-Level Embeddings for Unit Selection Speech Synthesis

Hybrid statistical/unit-selection Turkish speech synthesis using suffix units

Article Open access 02 February 2016

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

References

Clark, R., Richmond, K., King, S.: Multisyn voices from ARCTIC data for the Blizzard challenge. In: Proc. Interspeech (2007)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech, September 1999, pp. 2347–2350 (1999)
Google Scholar
Aylett, M.P., King, S.: Single speaker segmentation and inventory selection using dynamic time warping, self organization, and joint multigram mapping. In: 6th ISCA Speech Synthesis Workshop, Bonn, Germany (2007)
Google Scholar
Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proc. Interspeech 2008, Brisbane, Australia (2008)
Google Scholar
Moosmüller, S.: Soziophonologische Variation im gegenwärtigen Wiener Deutsch. Franz Steiner Verlag, Stuttgart (1987)
Google Scholar
Artmann, H.C.: Sämtliche Gedichte. Jung und Jung, Salzburg und Wien (2003)
Google Scholar
Damper, R.: Personal communication (June 2008)
Google Scholar
Davel, M., Barnard, E.: Pronunciation prediction with Default & Refine. Computer Speech and Language 22(4) (2008)
Google Scholar
Damper, R., Stanbridge, C., Marchard, Y.: A Pronunciation-by-Analogy Module for the Festival Text-to-Speech Synthesiser. In: SSW4 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Telecommunications Research Center Vienna (ftw.), Vienna, Austria
Michael Pucher
Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria
Friedrich Neubarth
Centre for Speech Technology Research (CSTR), University of Edinburgh, UK
Volker Strom

Authors

Michael Pucher
View author publications
You can also search for this author in PubMed Google Scholar
Friedrich Neubarth
View author publications
You can also search for this author in PubMed Google Scholar
Volker Strom
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Second University of Naples, and IIASS, Via Pellegrino, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Centre for Language and Communication Studies, Trinity College, The University of Dublin, Dublin 2, Ireland
Nick Campbell & Carl Vogel &
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pucher, M., Neubarth, F., Strom, V. (2010). Optimizing Phonetic Encoding for Viennese Unit Selection Speech Synthesis. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-12397-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics