Skip to main content
Log in

Speech synthesis of emotions using vowel features of a speaker

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Recently, methods for adding emotion to synthetic speech have received considerable attention in the field of speech synthesis research. We previously proposed a case-based method for generating emotional synthetic speech by exploiting the characteristics of the maximum amplitude and the utterance time of vowels, and the fundamental frequency of emotional speech. In the present study, we propose a method in which our reported method is further improved by controlling the fundamental frequency of emotional synthetic speech. As an initial investigation, we adopted the utterance of a Japanese name that is semantically neutral. By using the proposed method, emotional synthetic speech made from the emotional speech of one male subject was discriminable with a mean accuracy of 83.9 % when 18 subjects listened to the emotional synthetic utterances of “angry,” “happy,” “neutral,” “sad,” or “surprised” when the utterance was the Japanese name “Taro,” or “Hiroko.” Further adjustment of fundamental frequency in the proposed method made a much clearer impression on the subjects for emotional synthetic speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Donna E (2005) Expressive speech: production, perception and application to speech synthesis. Acoust Sci Technol 26(4):317–325

    Article  Google Scholar 

  2. Iida A, Iga S, Higuchi F et al (2000) A prototype of a speech synthesis system with emotion for assisting communication (in Japanese). Trans Human Interface Soc 2(2):63–70

    Google Scholar 

  3. Katae N, Kimura S (2000) An effect of voice quality and control in emotional speech synthesis (in Japanese). In: Proceedings of the autumn meeting the Acoustical Society of Japan, Vol. 2, pp 187–188

  4. Moriyama T, Mori S, Ozawa S (2009) A synthesis method of emotional speech using subspace constraints in prosody (in Japanese). Trans IPSJ 50(3):1181–1191

    Google Scholar 

  5. Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 93(2):1097–1108

    Google Scholar 

  6. Murray IR, Edgington MD, Campion D, et al (2000) Rule-based emotion synthesis using concatenated speech. In: Proceedings of Speech and Emotion, ISCA Tutorial and Research Workshop, Newcastle, UK, Sep 5–7, 2000, pp 173–177

  7. Ogata S, Yotsukura T, Morishima S (2000) Voice conversion to append emotional impression by controlling articulation information (in Japanese). IEICE Tech Rep Human Inf Process 99(582):53–58

    Google Scholar 

  8. Schröder M (2001) Emotional speech synthesis—A review. In: Dalsgaard P, Lindberg B, Benner H (Eds.), Proceedings of 7th European Conference on Speech Communication and Technology, Aalborg, Kommunik Grafiske Losninger A/S, Vol. 1, pp 561–564

  9. Boku K, Asada T, Yoshitomi Y et al (2012) Speech synthesis of emotions using vowel features. In: Roger L (ed) Software engineering, artificial intelligence, networking, and parallel/distributed computing 2012. Springer, Berlin, pp 129–141

    Google Scholar 

  10. Aoki N (2008) Sound programming in C (in Japanese). Ohmsha, Tokyo, pp 141–160

    Google Scholar 

  11. Kawahara T, et al (2010) Open-source large vocabulary CSR engine Julius. Julius Rev. 4.1.5.1. http://julius.sourceforge.jp/ Accessed 12 March 2013

  12. Hitachi Business Solution Co. Ltd (2012) Voice Sommelier Neo. http://hitachi-business.com/products/package/sound/voice/ Accessed 12 March 2013

Download references

Acknowledgments

We would like to thank all the participants who cooperated with us in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasunari Yoshitomi.

About this article

Cite this article

Boku, K., Asada, T., Yoshitomi, Y. et al. Speech synthesis of emotions using vowel features of a speaker. Artif Life Robotics 19, 27–32 (2014). https://doi.org/10.1007/s10015-013-0126-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-013-0126-9

Keywords