Speaker-Specific Pronunciation for Speech Synthesis

Latacz, Lukas; Mattheyses, Wesley; Verhelst, Werner

doi:10.1007/978-3-642-40585-3_63

Lukas Latacz^20,21,
Wesley Mattheyses²⁰ &
Werner Verhelst^20,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2391 Accesses
1 Citations

Abstract

A pronunciation lexicon for speech synthesis is a key component of a modern speech synthesizer, containing the orthography and phonemic transcriptions of a large number of words. A lexicon may contain words with multiple pronunciations, such as reduced and full versions of (function) words, homographs, or other types of words with multiple acceptable pronunciations such as foreign words or names. Pronunciation variants should therefore be taken into account during voice-building (e.g. segmentation and labeling of a speech database), as well as during synthesis.

In this paper we outline a strategy to automatically deal with these variants, resulting in a speaker-specific pronunciation. Based on a labeled speech database, the pronunciation lexicon is pruned in order to remove as much as possible pronunciation variation from the lexicon. This pruned lexicon can be used to train speaker-specific letter-to-sound rules. If the speaker has uttered a word in different ways, then these variants are not pruned. Instead, decision trees are trained for each of those words, which are used to select the most suitable pronunciation during synthesis. We tested our approach on five speech databases, and two lexicons per speech database. The automatic selection of pronunciation variants yielded a small improvement over the baseline (selecting always the most common variant).

The research reported in this paper was partly supported by the projects IWT-SPACE, iMinds-RAILS, iMinds-SEGA and EC FP7 ALIZ-E (FP7-ICT-248116).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fitt, S.: Unisyn multi-accent lexicon, version 1.3, http://www.cstr.ed.ac.uk/projects/unisyn
Mertens, P., Vercammen, F.: FONILEX manual. Technical report, K.U.Leuven CCL (1998)
Google Scholar
Kim, Y.J., Syrdal, A., Conkie, A.: Pronunciation lexicon adaptation for TTS voice building. In: Proceedings Interspeech 2004, Jeju Island, Korea, pp. 2569–2572 (2004)
Google Scholar
Hamza, W., Eide, E., Bakis, R.: Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system. In: Proceedings Interspeech 2004, Jeju Island, Korea, pp. 2561–2564 (2004)
Google Scholar
Clark, R.A.J., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication 49, 317–330 (2007)
Article Google Scholar
Bennett, C., Black, A.: Prediction of pronunciation variations for speech synthesis: A data-driven approach. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2005 (ICASSP 2005), Philadelphia, PA, USA, vol. 1, pp. 297–300 (2005)
Google Scholar
Weide, R.L.: The carnegie mellon university pronouncing dictionary, version 0.4 (1995)
Google Scholar
Mitton, R.: A description of a computer-usable dictionary file based on the oxford advanced learner’s dictionary of current english. Technical report, Oxford Text Archive (1992)
Google Scholar
Kerkhoff, J., Marsi, E.: NeXTeNS: a new open source text-to-speech system for dutch. In: 13th Meeting of Computational Linguistics in the Netherlands (2002)
Google Scholar
Baayen, R.H., Piepenbrock, R., Gulikers, L.: The CELEX lexical database (CD-ROM). Technical report, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA (1995)
Google Scholar
Demuynck, K., Roelens, J., Compernolle, D.V., Wambacq, P.: SPRAAK: an open source “SPeech recognition and automatic annotation kit”. In: Proceedings Interspeech 2008, Brisbane, Australia, p. 495 (2008)
Google Scholar
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings Fifth ISCA Workshop on Speech Synthesis (SSW5). ISCA (2004)
Google Scholar
King, S., Karaiskos, V.: The blizzard challenge 2010. In: Blizzard Challenge Workshop 2010 (2010)
Google Scholar
Mattheyses, W., Latacz, L., Verhelst, W.: Auditory and photo-realistic audiovisual speech synthesis for dutch. In: Proceedings International Conference on Auditory-Visual Speech Processing 2011 (AVSP 2011), Volterra, Italy, pp. 55–60 (2011)
Google Scholar
Duchateau, J., Kong, Y.O., Cleuren, L., Latacz, L., Roelens, J., Samir, A., Demuynck, K., Ghesquière, P., Verhelst, W., et al.: Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules. Speech Communication 51, 985–994 (2009)
Article Google Scholar
Van Dalen, R.C., Wiggers, P., Rothkrantz, L.J.M.: Lexical stress in continuous speech recognition. In: Proceedings Interspeech 2006, Pittsburgh, PA, USA (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. ETRO-DSSP, Vrije Universiteit Brussel, Brussels, Belgium
Lukas Latacz, Wesley Mattheyses & Werner Verhelst
Dept. of Future Media and Imaging, iMinds, Ghent, Belgium
Lukas Latacz & Werner Verhelst

Authors

Lukas Latacz
View author publications
You can also search for this author in PubMed Google Scholar
Wesley Mattheyses
View author publications
You can also search for this author in PubMed Google Scholar
Werner Verhelst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Latacz, L., Mattheyses, W., Verhelst, W. (2013). Speaker-Specific Pronunciation for Speech Synthesis. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_63

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics