Abstract
Recently, methods for adding emotion to synthetic speech have received considerable attention in the field of speech synthesis research. We previously proposed a case-based method for generating emotional synthetic speech by exploiting the characteristics of the maximum amplitude and the utterance time of vowels, and the fundamental frequency of emotional speech. In the present study, we propose a method in which our reported method is further improved by controlling the fundamental frequency of emotional synthetic speech. As an initial investigation, we adopted the utterance of a Japanese name that is semantically neutral. By using the proposed method, emotional synthetic speech made from the emotional speech of one male subject was discriminable with a mean accuracy of 83.9 % when 18 subjects listened to the emotional synthetic utterances of “angry,” “happy,” “neutral,” “sad,” or “surprised” when the utterance was the Japanese name “Taro,” or “Hiroko.” Further adjustment of fundamental frequency in the proposed method made a much clearer impression on the subjects for emotional synthetic speech.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Donna E (2005) Expressive speech: production, perception and application to speech synthesis. Acoust Sci Technol 26(4):317–325
Iida A, Iga S, Higuchi F et al (2000) A prototype of a speech synthesis system with emotion for assisting communication (in Japanese). Trans Human Interface Soc 2(2):63–70
Katae N, Kimura S (2000) An effect of voice quality and control in emotional speech synthesis (in Japanese). In: Proceedings of the autumn meeting the Acoustical Society of Japan, Vol. 2, pp 187–188
Moriyama T, Mori S, Ozawa S (2009) A synthesis method of emotional speech using subspace constraints in prosody (in Japanese). Trans IPSJ 50(3):1181–1191
Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 93(2):1097–1108
Murray IR, Edgington MD, Campion D, et al (2000) Rule-based emotion synthesis using concatenated speech. In: Proceedings of Speech and Emotion, ISCA Tutorial and Research Workshop, Newcastle, UK, Sep 5–7, 2000, pp 173–177
Ogata S, Yotsukura T, Morishima S (2000) Voice conversion to append emotional impression by controlling articulation information (in Japanese). IEICE Tech Rep Human Inf Process 99(582):53–58
Schröder M (2001) Emotional speech synthesis—A review. In: Dalsgaard P, Lindberg B, Benner H (Eds.), Proceedings of 7th European Conference on Speech Communication and Technology, Aalborg, Kommunik Grafiske Losninger A/S, Vol. 1, pp 561–564
Boku K, Asada T, Yoshitomi Y et al (2012) Speech synthesis of emotions using vowel features. In: Roger L (ed) Software engineering, artificial intelligence, networking, and parallel/distributed computing 2012. Springer, Berlin, pp 129–141
Aoki N (2008) Sound programming in C (in Japanese). Ohmsha, Tokyo, pp 141–160
Kawahara T, et al (2010) Open-source large vocabulary CSR engine Julius. Julius Rev. 4.1.5.1. http://julius.sourceforge.jp/ Accessed 12 March 2013
Hitachi Business Solution Co. Ltd (2012) Voice Sommelier Neo. http://hitachi-business.com/products/package/sound/voice/ Accessed 12 March 2013
Acknowledgments
We would like to thank all the participants who cooperated with us in the experiments.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Boku, K., Asada, T., Yoshitomi, Y. et al. Speech synthesis of emotions using vowel features of a speaker. Artif Life Robotics 19, 27–32 (2014). https://doi.org/10.1007/s10015-013-0126-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-013-0126-9