Skip to main content

Expressive Speech Synthesis System Using Unit Selection

  • Conference paper
Book cover Mining Intelligence and Knowledge Exploration

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8284))

Abstract

Speech for realistic environment is hard to achieve. Emotion synthesizing is one way to achieve realistic and natural sounding speech. Use of right emotion in synthesized speech generates the speech which is more effective and natural for listener. The implementation of emotions is very difficult, as word “emotion” has no single definition. There have been various attempts in creating emotional speech synthesis but perfect or near to ideal system has not been developed so far. Our paper is an attempt to create emotional speech synthesizer, where we have used the emotional database recorded in our own voice. We have used unit selection and CART method to implement it. We have taken class room environment for teaching pre-school students with three emotions i.e neutral, happy, sad and tested our synthesizer with twenty listeners and found that listeners have significantly identify the emotional state of speaker.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cole, R.A., Zue, V.: Survey of the State of the Art in Human Language Technology, ch. 1, pp. 1–2

    Google Scholar 

  2. Jakobson, R.: Structure of Language and Its mathematical Aspects. In: Symposia in Applied Mathematics. AMS Bookstore (1980)

    Google Scholar 

  3. Jurafsky, D., Martin, J.H.: Speech and Language Processing, p. 346. Prentice Hall (2008)

    Google Scholar 

  4. Roger, W.E.: English Phonemes. Department of English Furman University Greenville, http://eweb.furman.edu/~wrogers/phonemes/

  5. O’Grady, W., et al.: Contemporary Linguistics: An Introduction, 5th edn. Bedford/St. Martin’s (2005)

    Google Scholar 

  6. Cornelius, R.R.: Theoretical approaches to emotion. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 3–10 (2000)

    Google Scholar 

  7. Schröder, M.: Speech and Emotion Research. An overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis. PhD thesis. Universität des Saarlandes. Saarbrücken (2003)

    Google Scholar 

  8. Hofer, G.O.: Emotional Speech Synthesis. Master of Science Thesis School of Informatics University of Edinburgh (2004)

    Google Scholar 

  9. Schere, K.R.: Vocal affect expression: A review and a model for future research. Psychological Bulletin 99, 143–165 (1986)

    Article  Google Scholar 

  10. Averill, J.R.: A semantic atlas of emotional concepts. JSAS Catalog of Selected Documents in Psychology 5:330. Ms. No. 421 (1975)

    Google Scholar 

  11. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)

    Article  Google Scholar 

  12. Schröder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., Gielen, S.: Acoustic Correlates of Emotion Dimensions in View of Speech Synthesis. In: Eurospeech 2001, vol. 1, pp. 87–90 (2001)

    Google Scholar 

  13. Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1128–1136 (2006)

    Article  Google Scholar 

  14. Eide, E.: Preservation, Identification, And Use Of Emotion. In: A Text-To-Speech System, pp. 127–130. IEEE (2002)

    Google Scholar 

  15. Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, pp. 1227–1230 (1996)

    Google Scholar 

  16. Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System Using A Large Speech Database, pp. 373–376. IEEE (1996)

    Google Scholar 

  17. Dutoit, T.: An Introduction to Text-to-Speech Synthesis, ch. 6, pp. 150–160. Springer (1997)

    Google Scholar 

  18. Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Master Thesis, CASE - Center of Applied Statistics and Economics Humboldt University, Berlin (December 20, 2004)

    Google Scholar 

  19. TTSBOX available online, http://tcts.fpms.ac.be/projects/ttsbox

  20. Audacity available online, http://audacity.sourceforge.net/

  21. Wavesurfer available online, http://www.speech.kth.se/wavesurfer/

  22. Black, A.W., Campbell, N.: Optimizing selection of Units from Speech Databases for Concatenative Synthesis. In: Proc. of Eurospeech 1995, vol. 1, pp. 581–584 (1995)

    Google Scholar 

  23. Liberman, M.Y., Church, K.W.: Text Analysis and Word Pronunciation in Text-to-Speech Synthesis. In: Advances in Speech Signal Processing, New York (1992)

    Google Scholar 

  24. Galanis, D., Darsinos, V., Kokkinakis, G.: Investigating Emotional Speech Parameters For Speech Synthesis. In: ICECS 1996, p. 1227 (1996)

    Google Scholar 

  25. Moore, et al.:Three Dimensional Speech Synthesis. U.S. Patent 5, pp. 561–736 (October 1,1996)

    Google Scholar 

  26. Sodnik, J., Tomažič, S.: Spatial Speaker: 3D Java Text-to-Speech Converter. In: World Congress on Engineering and Computer Science Vol II (WCECS 2009), San Francisco, USA, pp. 1306–1310 (2009)

    Google Scholar 

  27. Oliveira, L.C., Paulo, S., Figueira, L., Mendes, C., Nunes, A., Godinho, J.: Methodologies for Designing and Recording Speech Databases for Corpus Based Synthesis. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Gahlawat, M., Malik, A., Bansal, P. (2013). Expressive Speech Synthesis System Using Unit Selection. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03844-5_40

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03843-8

  • Online ISBN: 978-3-319-03844-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics