Abstract
The goal of this paper is to describe the development of the sound database for the allophone-based model for English concatenative speech synthesis. The procedure of the sound unit inventory construction is described and its main results are presented. At present moment the optimized sound units inventory of the allophonic database for English concatenative speech synthesis contains 1200 elements (1000 vowel allophones and 200 consonant allophones). The smoothness of junctions between the allophones shows high quality of the segmentation made. The decrease in the number of the database components in the result of optimization does not affect the quality of the resulting synthesized speech. At the level of segments it can be evaluated as fairly high in terms of both naturalness and intelligibility.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bondarko, L.V., Kuznetsov, V.I., Skrelin, P.A.: The Sound System of the Russian Language from the point of view of the objectives of Russian Speech Concatenative Synthesis. In: Bulleten’ foneticheskogo fonda russkogo jazyka, N 6. St-Petersburg-Bochum (1997) (in Russian)
Evgrafova, K.V.: The Principles of the English Allophonic Database Formation. In: Foneticheskij litsej. St-Petersburg, pp. 23–36 (2004) (in Russian)
Gimson, A.C.: An Introduction to the Pronunciation of English. London (1962)
O’Connor, J.D.: Phonetics. London (1977)
Shalonova, K.B.: The Acoustical Characteristics of the Transitions between Sounds, St-Petersburg (1996) (in Russian)
Skrelin, P.A.: Concatenative Russian Speech Synthesis: Sound Database Formation Principles. In: Proc. of the SPECOM 1997, Cluj-Napoka (1997)
Skrelin, P.A.: The Phonetic Aspects of Speech Technologies, St-Petersburg (1999) (in Russian)
Skrelin, P.A.: The Segmentation and Transcription, St-Petersburg (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Evgrafova, K. (2005). The Sound Database Formation for the Allophone-Based Model for English Concatenative Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_28
Download citation
DOI: https://doi.org/10.1007/11551874_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)