Abstract
Against a background of incorporating a talking head into a role-playing simulator, enhancements are proposed for users of the simulator and of text-to-speech systems in general. The first is the ability to generate vocal emotion in synthetic speech using a limited number of prosodic parameters with a concatenative speech synthesizer. The second enhancement allows for vocal emotions to be included during the authoring of text for output by the text-to-speech system. Vocal emotions can be represented visually, and can be manipulated directly by the user. Applications such as training simulators that use synthetic speech can be made more ‘human’ by the addition of emotions. A graphical editor for specifying and directly manipulating the speech improves the authoring environment of these applications.
Similar content being viewed by others
References
J.Allen, M.S.Hunnicutt, and D.Klatt, From Text to Speech: The MITalk System, Cambridge University Press: Cambridge, 1987.
C.Baber, “Speech output,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 21–24.
B.L.Brown, W.J.Strong, and A.C.Rencher, “Fifty-four voices from two: The effects of simultaneous manipulations of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech,” Journal of the Acoustical Society of America, Vol. 55, pp. 313–318, 1974.
J.E.Cahn, “Generating expression in synthesized speech,” Technical Report, M.I.T. Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1990.
R.Carlson, B.Granström, and I.Karlsson, “Experiments with voice modelling in speech synthesis,” Speech Communication, Vol. 10, pp. 481–489, 1991.
R.Collier, “Multi-language intonation synthesis,” Journal of Phonetics, Vol. 19, pp. 61–74, 1991.
C.K.Cowley and D.M.Jones, “Assessing the quality of synthetic speech,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 149–155.
D.Crystal, The English Tone of Voice, Edward Arnold: London, 1975.
Digital Equipment Corporation, DECtalk DTC03 Text-to-Speech System Owner's Manual, Maynard, MA, 1985.
J.H. Eggen, “On the Quality of Synthetic Speech, Evaluation and Improvements,” Doctoral Thesis, University of Eindhoven, 1992.
R.W.Frick, “The prosodic expression of anger: Differentiating threat and frustration,” Aggressive Behavior, Vol. 12, pp. 121–128, 1986.
C.G.Henton, “Fact and fiction in the use of female and male pitch,” Language and Communication, Vol. 9, pp. 299–311, 1989.
C.Henton, “The abnormality of male speech,” in New Departures in Linguistics, G.Wolf (Ed.), Garland Press: New York, 1992a, pp. 27–58.
C. Henton, “Sex and speech synthesis: Techniques, successes, and challenges,” in Proceedings of the Fourth Australian International Conference on Speech Science and Technology (SST-92), Brisbane, 1992b, pp. 738–743.
C.Henton, “Speech synthesis: Telling it like it is,” Australasian Wheels for the Mind, Vol. 3, pp. 40–45. 1993.
C. Henton, “Beyond visemes: Using disemes in synthetic speech with facial animation,” Journal of the Acoustical Society of America, Vol. 95, p. 3010, 1994.
C.Henton, “Pitch dynamism in female and male speech,” Language and Communication, Vol. 15, pp. 43–61, 1995.
C. Henton and P. Litwinowicz, “Saying it with feeling: Techniques for synthesizing visible, emotional speech,” in Proceedings, 2nd. ESCA/IEEE Workshop on Speech Synthesis, 1994, pp. 73–76.
Inside Macintosh. Sound (1994), Apple Computer, Inc., Cupertino, CA.
A. James and J.C. Spohrer, “Simulation-based learning systems: Prototypes and experiences,” in Proceedings, ACM/SIGCHI Human Factors in Computing Systems, Monterey, CA, May 3–7, 1992, pp. 523–524.
D.H.Klatt, “Review of text-to-speech conversion for English,” Journal of the Acoustical Society of America, Vol. 82, pp. 737–793, 1987.
D.H.Klatt and L.C.Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers,” Journal of the Acoustical Society of America, Vol. 87, pp. 820–855, 1990.
J.Laver, The Phonetic Description of Voice Quality, Cambridge University Press: Cambridge, 1980.
P. Litwinowicz and L. Williams, “Animating images with drawings,” SIGGRAPH'94 Conference Proceedings, 1994, pp. 121–124.
D.W.Massaro, “Speech perception by ear and by eye: A paradigm for psychological enquiry,” Lawrence Erlbaum Associates: Hillsdale, NJ, 1987.
D.W.Massaro, M.M.Cohen, and P.M.T.Smeele, “Cross-linguistic comparisons in the integration of visual and auditory speech,” Memory and Cognition, Vol. 23, pp. 113–131, 1995.
D.W.Massaro and E.L.Ferguson, “Cognitive style and perception: The relationship between category width and speech perception, categorization, and discrimination,” American Journal of Psychology, Vol. 106, pp. 25–49, 1993.
I.R.Murray and J.L.Arnott, “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” Journal of the Acoustical Society of America Vol. 93, pp. 1097–1108, 1993.
A.Ortony and T.J.Turner, “What's basic about basic emotions?,” Psychological Review, Vol. 97, pp. 315–331, 1990.
D.O'Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley: Reading, Mass., 1990.
E.Patterson, P.Litwinowicz, and N.Greene, “Facial animation by spatial mapping,” Computer Animation 1991, Springer Verlag: New York, 1991, pp. 31–44.
K.R.Scherer, “Emotion as a multicomponent process: A model and some cross-cultural data,” Review of Personality and Social Psychology, Vol. 5, pp. 37–63, 1984.
J.C.Spohrer, A.James, C.A.Abbott, G.J.Czora, J.Laffey, and M.L.Miller, “A role-playing simulator for needs analysis consulatations,” in Proceedings of the World Congress on Expert Systems, Pergamon Press: Orlando, FL, 1991.
K.N.Stevens and C.A.Bickley, “Constraints among parameters simplify control of Klatt formant synthesizer,” Journal of Phonetics, Vol. 19, pp. 161–174, 1991.
M.Tatham, “Voice output for human-machine interaction,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 25–35.
R.A.M.G.vanBezooijen, Characteristics and Recognizability of Vocal Expressions of Emotion, Foris: Dordrecht, 1984.
T.Vitale, “Issues in speech technology for persons with disabilities,” Journal of the American Voice I/O Society, Vol. 12, pp. 13–34, 1992.
E.J.Yannakoudakis and P.J.Hutton, Speech Synthesis and Recognition Systems, Halsted Press: New York, 1987.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Henton, C., Edelman, B. Generating and manipulating emotional synthetic speech on a personal computer. Multimed Tools Appl 3, 105–125 (1996). https://doi.org/10.1007/BF00429747
Issue Date:
DOI: https://doi.org/10.1007/BF00429747