Abstract
Speech for realistic environment is hard to achieve. Emotion synthesizing is one way to achieve realistic and natural sounding speech. Use of right emotion in synthesized speech generates the speech which is more effective and natural for listener. The implementation of emotions is very difficult, as word “emotion” has no single definition. There have been various attempts in creating emotional speech synthesis but perfect or near to ideal system has not been developed so far. Our paper is an attempt to create emotional speech synthesizer, where we have used the emotional database recorded in our own voice. We have used unit selection and CART method to implement it. We have taken class room environment for teaching pre-school students with three emotions i.e neutral, happy, sad and tested our synthesizer with twenty listeners and found that listeners have significantly identify the emotional state of speaker.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cole, R.A., Zue, V.: Survey of the State of the Art in Human Language Technology, ch. 1, pp. 1–2
Jakobson, R.: Structure of Language and Its mathematical Aspects. In: Symposia in Applied Mathematics. AMS Bookstore (1980)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, p. 346. Prentice Hall (2008)
Roger, W.E.: English Phonemes. Department of English Furman University Greenville, http://eweb.furman.edu/~wrogers/phonemes/
O’Grady, W., et al.: Contemporary Linguistics: An Introduction, 5th edn. Bedford/St. Martin’s (2005)
Cornelius, R.R.: Theoretical approaches to emotion. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 3–10 (2000)
Schröder, M.: Speech and Emotion Research. An overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis. PhD thesis. Universität des Saarlandes. Saarbrücken (2003)
Hofer, G.O.: Emotional Speech Synthesis. Master of Science Thesis School of Informatics University of Edinburgh (2004)
Schere, K.R.: Vocal affect expression: A review and a model for future research. Psychological Bulletin 99, 143–165 (1986)
Averill, J.R.: A semantic atlas of emotional concepts. JSAS Catalog of Selected Documents in Psychology 5:330. Ms. No. 421 (1975)
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)
Schröder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., Gielen, S.: Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis. In: Eurospeech 2001, vol. 1, pp. 87–90 (2001)
Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1128–1136 (2006)
Eide, E.: Preservation, Identification, And Use Of Emotion. In: A Text-To-Speech System, pp. 127–130. IEEE (2002)
Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, pp. 1227–1230 (1996)
Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using A Large Speech Database, pp. 373–376. IEEE (1996)
Dutoit, T.: An Introduction to Text-to-Speech Synthesis, ch. 6, pp. 150–160. Springer (1997)
Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Master Thesis, CASE - Center of Applied Statistics and Economics Humboldt University, Berlin (December 20, 2004)
TTSBOX available online, http://tcts.fpms.ac.be/projects/ttsbox
Audacity available online, http://audacity.sourceforge.net/
Wavesurfer available online, http://www.speech.kth.se/wavesurfer/
Black, A.W., Campbell, N.: Optimizing selection of Units from Speech Databases for Concatenative Synthesis. In: Proc. of Eurospeech 1995, vol. 1, pp. 581–584 (1995)
Liberman, M.Y., Church, K.W.: Text Analysis and Word Pronunciation in Text-to-Speech Synthesis. In: Advances in Speech Signal Processing, New York (1992)
Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, p. 1227 (1996)
Moore, et al.:Three Dimensional Speech Synthesis. U.S. Patent 5, pp. 561–736 (October 1,1996)
Sodnik, J., Tomažič, S.: Spatial Speaker: 3D Java Text-to-Speech Converter. In: World Congress on Engineering and Computer Science Vol II (WCECS 2009), San Francisco, USA, pp. 1306–1310 (2009)
Oliveira, L.C., Paulo, S., Figueira, L., Mendes, C., Nunes, A., Godinho, J.: Methodologies for Designing and Recording Speech Databases for Corpus Based Synthesis. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Gahlawat, M., Malik, A., Bansal, P. (2013). Expressive Speech Synthesis System Using Unit Selection. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-03844-5_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03843-8
Online ISBN: 978-3-319-03844-5
eBook Packages: Computer ScienceComputer Science (R0)