Generating and manipulating emotional synthetic speech on a personal computer

Henton, Caroline; Edelman, Bradley

doi:10.1007/BF00429747

Generating and manipulating emotional synthetic speech on a personal computer

Published: September 1996

Volume 3, pages 105–125, (1996)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Caroline Henton¹ &
Bradley Edelman²

74 Accesses
3 Altmetric
Explore all metrics

Abstract

Against a background of incorporating a talking head into a role-playing simulator, enhancements are proposed for users of the simulator and of text-to-speech systems in general. The first is the ability to generate vocal emotion in synthetic speech using a limited number of prosodic parameters with a concatenative speech synthesizer. The second enhancement allows for vocal emotions to be included during the authoring of text for output by the text-to-speech system. Vocal emotions can be represented visually, and can be manipulated directly by the user. Applications such as training simulators that use synthetic speech can be made more ‘human’ by the addition of emotions. A graphical editor for specifying and directly manipulating the speech improves the authoring environment of these applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Article Open access 08 September 2015

A Dynamic Speech Breathing System for Virtual Characters

References

J.Allen, M.S.Hunnicutt, and D.Klatt, From Text to Speech: The MITalk System, Cambridge University Press: Cambridge, 1987.
Google Scholar
C.Baber, “Speech output,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 21–24.
Google Scholar
B.L.Brown, W.J.Strong, and A.C.Rencher, “Fifty-four voices from two: The effects of simultaneous manipulations of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech,” Journal of the Acoustical Society of America, Vol. 55, pp. 313–318, 1974.
Google Scholar
J.E.Cahn, “Generating expression in synthesized speech,” Technical Report, M.I.T. Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1990.
Google Scholar
R.Carlson, B.Granström, and I.Karlsson, “Experiments with voice modelling in speech synthesis,” Speech Communication, Vol. 10, pp. 481–489, 1991.
Google Scholar
R.Collier, “Multi-language intonation synthesis,” Journal of Phonetics, Vol. 19, pp. 61–74, 1991.
Google Scholar
C.K.Cowley and D.M.Jones, “Assessing the quality of synthetic speech,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 149–155.
Google Scholar
D.Crystal, The English Tone of Voice, Edward Arnold: London, 1975.
Google Scholar
Digital Equipment Corporation, DECtalk DTC03 Text-to-Speech System Owner's Manual, Maynard, MA, 1985.
J.H. Eggen, “On the Quality of Synthetic Speech, Evaluation and Improvements,” Doctoral Thesis, University of Eindhoven, 1992.
R.W.Frick, “The prosodic expression of anger: Differentiating threat and frustration,” Aggressive Behavior, Vol. 12, pp. 121–128, 1986.
Google Scholar
C.G.Henton, “Fact and fiction in the use of female and male pitch,” Language and Communication, Vol. 9, pp. 299–311, 1989.
Google Scholar
C.Henton, “The abnormality of male speech,” in New Departures in Linguistics, G.Wolf (Ed.), Garland Press: New York, 1992a, pp. 27–58.
Google Scholar
C. Henton, “Sex and speech synthesis: Techniques, successes, and challenges,” in Proceedings of the Fourth Australian International Conference on Speech Science and Technology (SST-92), Brisbane, 1992b, pp. 738–743.
C.Henton, “Speech synthesis: Telling it like it is,” Australasian Wheels for the Mind, Vol. 3, pp. 40–45. 1993.
Google Scholar
C. Henton, “Beyond visemes: Using disemes in synthetic speech with facial animation,” Journal of the Acoustical Society of America, Vol. 95, p. 3010, 1994.
Google Scholar
C.Henton, “Pitch dynamism in female and male speech,” Language and Communication, Vol. 15, pp. 43–61, 1995.
Google Scholar
C. Henton and P. Litwinowicz, “Saying it with feeling: Techniques for synthesizing visible, emotional speech,” in Proceedings, 2nd. ESCA/IEEE Workshop on Speech Synthesis, 1994, pp. 73–76.
Inside Macintosh. Sound (1994), Apple Computer, Inc., Cupertino, CA.
A. James and J.C. Spohrer, “Simulation-based learning systems: Prototypes and experiences,” in Proceedings, ACM/SIGCHI Human Factors in Computing Systems, Monterey, CA, May 3–7, 1992, pp. 523–524.
D.H.Klatt, “Review of text-to-speech conversion for English,” Journal of the Acoustical Society of America, Vol. 82, pp. 737–793, 1987.
Google Scholar
D.H.Klatt and L.C.Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers,” Journal of the Acoustical Society of America, Vol. 87, pp. 820–855, 1990.
Google Scholar
J.Laver, The Phonetic Description of Voice Quality, Cambridge University Press: Cambridge, 1980.
Google Scholar
P. Litwinowicz and L. Williams, “Animating images with drawings,” SIGGRAPH'94 Conference Proceedings, 1994, pp. 121–124.
D.W.Massaro, “Speech perception by ear and by eye: A paradigm for psychological enquiry,” Lawrence Erlbaum Associates: Hillsdale, NJ, 1987.
Google Scholar
D.W.Massaro, M.M.Cohen, and P.M.T.Smeele, “Cross-linguistic comparisons in the integration of visual and auditory speech,” Memory and Cognition, Vol. 23, pp. 113–131, 1995.
Google Scholar
D.W.Massaro and E.L.Ferguson, “Cognitive style and perception: The relationship between category width and speech perception, categorization, and discrimination,” American Journal of Psychology, Vol. 106, pp. 25–49, 1993.
Google Scholar
I.R.Murray and J.L.Arnott, “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” Journal of the Acoustical Society of America Vol. 93, pp. 1097–1108, 1993.
Google Scholar
A.Ortony and T.J.Turner, “What's basic about basic emotions?,” Psychological Review, Vol. 97, pp. 315–331, 1990.
Google Scholar
D.O'Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley: Reading, Mass., 1990.
Google Scholar
E.Patterson, P.Litwinowicz, and N.Greene, “Facial animation by spatial mapping,” Computer Animation 1991, Springer Verlag: New York, 1991, pp. 31–44.
Google Scholar
K.R.Scherer, “Emotion as a multicomponent process: A model and some cross-cultural data,” Review of Personality and Social Psychology, Vol. 5, pp. 37–63, 1984.
Google Scholar
J.C.Spohrer, A.James, C.A.Abbott, G.J.Czora, J.Laffey, and M.L.Miller, “A role-playing simulator for needs analysis consulatations,” in Proceedings of the World Congress on Expert Systems, Pergamon Press: Orlando, FL, 1991.
Google Scholar
K.N.Stevens and C.A.Bickley, “Constraints among parameters simplify control of Klatt formant synthesizer,” Journal of Phonetics, Vol. 19, pp. 161–174, 1991.
Google Scholar
M.Tatham, “Voice output for human-machine interaction,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 25–35.
Google Scholar
R.A.M.G.vanBezooijen, Characteristics and Recognizability of Vocal Expressions of Emotion, Foris: Dordrecht, 1984.
Google Scholar
T.Vitale, “Issues in speech technology for persons with disabilities,” Journal of the American Voice I/O Society, Vol. 12, pp. 13–34, 1992.
Google Scholar
E.J.Yannakoudakis and P.J.Hutton, Speech Synthesis and Recognition Systems, Halsted Press: New York, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Voice Processing Corporation, 1 Main Street, 02142, Cambridge, MA, USA
Caroline Henton
Internet Products Group, Adobe Systems Inc., 1585 Charleston Road, P.O. Box 7900, 94039, Mountain View, CA, USA
Bradley Edelman

Authors

Caroline Henton
View author publications
You can also search for this author in PubMed Google Scholar
Bradley Edelman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Henton, C., Edelman, B. Generating and manipulating emotional synthetic speech on a personal computer. Multimed Tools Appl 3, 105–125 (1996). https://doi.org/10.1007/BF00429747

Download citation

Issue Date: September 1996
DOI: https://doi.org/10.1007/BF00429747

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating and manipulating emotional synthetic speech on a personal computer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

A Dynamic Speech Breathing System for Virtual Characters

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Generating and manipulating emotional synthetic speech on a personal computer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

A Dynamic Speech Breathing System for Virtual Characters

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation