Definition
Over the last decade, speech synthesis, the technology that enables machines to talk to humans, has become so natural-sounding that a naïve listener might assume that he/she is listening to a recording of a live human speaker. Speech synthesis is not new; indeed, it took several decades to arrive where it is today. Originally starting from the idea of using physics-based models of the vocal-tract, it took many years of research to perfect the encapsulation of the acoustic properties of the vocal-tract as a “black box”, using so-called formant synthesizers. Then, with the help of ever more powerful computing technology, it became viable to use snippets of recorded speech directly and glue them together to create new sentences in the form of concatenative synthesizers. Combining this idea with now available methods for fast search, potentially millions of choices are evaluated to find the optimal...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schroeter, J.: Basic principles of speech synthesis, In: Benesty, J. (ed.) Springer Handbook of Speech Processing and Communication, Chap. 19 (2008)
Bader, J.L.: Presidents as pitchmen, and posthumous play-by-play, commentary in the New York Times, August 9 (2001)
van Santen, J., Sproat, R., Olive, J., Hirschberg, J., (eds.): Progress in speech synthesis, section III. Springer, NY (1997)
Holmes, J.N.: Research report formant synthesizers: cascade or parallel? Speech Commun. 2(4), 251–273 (1983)
Sproat, R. (ed.): Multilingual text-to-speech synthesis. The bell labs approach. Kluwer Academic Publishers, Dordrecht MA (1998)
Hunt, A., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the ICASSP-96, pp. 373–376, GA, USA (1996)
Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Dutoit, T.: Corpus-based speech synthesis, In: Benesty, J. (ed.) Springer Handbook of Speech Processing and Communication, Chap. 21 (2008)
van Santen, J.: Prosodic processing. In: Benesty, J. (ed.) Springer Handbook of Speech Processing and Communication, Chap. 23 (2008)
Cosatto, E., Graf, H.P., Ostermann, J., Schroeter, J.: From audio-only to audio and video text-to-speech. Acta Acustica 90, 1084–1095 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this entry
Cite this entry
Schroeter, J. (2009). Voice Sample Synthesis. In: Li, S.Z., Jain, A. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-73003-5_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-73003-5_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-73002-8
Online ISBN: 978-0-387-73003-5
eBook Packages: Computer ScienceReference Module Computer Science and Engineering