Abstract
There is a need for speech synthesis to be more emotionally expressive. Implicit control of a subset of affective vocal effects could be advantageous for some applications. Physiological measures associated with autonomic nervous system (ANS) activity are potential candidates for such input. This paper describes a pilot study investigating physiological sensor readings as potential input signals for modulating the speech synthesis of affective utterances composed by human users. A small corpus of audio, heart rate, and skin conductance data has been collected from eight doctoral student oral defenses. Planned analysis and research phases are outlined.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Juslin, P.N., Laukka, P.: Communication of emotions in vocal expression and music performance: Different channels, same code? Psychol. Bull. 129(5), 770–814 (2003)
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Campbell, N., Mokhtari, P.: Voice quality: The 4th prosodic dimension. In: 15th ICPhS, pp. 2417–2420 (2003)
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Fernandez, R.: A Computational Model for the Automatic Recognition of Affect in Speech. Phd thesis, Massachusetts Institute of Technology, Cambridge, MA (2004)
Schötz, S.: Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. Term paper, Department of Linguistic and Phonetics. Lund University (2002)
Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977)
Traunmüller, H.: Evidence for demodulation in speech perception. In: Proceedings of the 6th ICSLP, vol. 3, pp. 790–793 (2000)
Burkhardt, F., Stegmann, J.: Emotional speech synthesis: Applications, history and possible future. In: Proc. ESSV 2009, Dresden (2009)
Mirenda, P., Beukelman, D.R.: Augmentative & Alternative Communication: Supporting Children & Adults With Complex Communication Needs, 3rd edn. Paul H Brookes Pub. Co. (2005)
Lai, J., Karat, C.-M., Yankelovich, N.: Conversational speech interfaces and technologies. In: Sears, A., Jacko, J.A. (eds.) The Human-Computer Interaction Handbook, 2nd edn., pp. 381–391. CRC Press, Taylor & Francis (2008)
Schröder, M.: Expressive speech synthesis: Past, present, and possible futures. In: Tao, J., Tan, T. (eds.) Affective Information Processing, pp. 111–126. Springer, London (2009)
Monrad-Krohn, G.H.: Dysprosody or altered “melody of language”. Brain 70(4), 405–415 (1947)
Campbell, N.: Developments in corpus-based speech synthesis: Approaching natural conversational speech. IEICE T. Inf. Syst. 88(3), 376–383 (2005)
Pfeifer, R., Bongard, J., Grand, S.: How the Body Shapes the Way We Think: A New View of Intelligence. The MIT Press, Cambridge (2007)
Cacioppo, J.T., Tassinary, L.G., Berntson, G.G.: Handbook of Psychophysiology, 3rd edn. Cambridge Univ. Pr., New York (2007)
Peter, C., Ebert, E., Beikirch, H.: Physiological sensing for affective computing. In: Tao, J., Tan, T. (eds.) Affective Information Processing, pp. 293–310. Springer, London (2009)
Scherer, K.R.: Vocal correlates of emotional arousal and affective disturbance. In: Wagner, H.L., Manstead, A.S.R. (eds.) Handbook of Social Psychophysiology, pp. 165–197. Wiley, New York (1989)
Juslin, P.N., Scherer, K.R.: Vocal expression of affect. In: Harrigan, J.A., Rosenthal, R., Scherer, K.R. (eds.) The New Handbook of Methods in Nonverbal Behavior Research. Series in Affective Science, pp. 65–135. Oxford Univeristy Press, New York (2008)
Lisetti, C.L., Nasoz, F.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on Applied Signal Processing 11, 1672–1687 (2004)
Poh, M.Z., Swenson, N.C., Picard, R.W.: A wearable sensor for unobtrusive, Long-Term assessment of electrodermal activity. IEEE Transactions on Biomedical Engineering 57(5), 1243–1252 (2010)
Council of Europe: Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge Univ. Press (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hennig, S. (2011). Candidacy of Physiological Measurements for Implicit Control of Emotional Speech Synthesis. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-24571-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24570-1
Online ISBN: 978-3-642-24571-8
eBook Packages: Computer ScienceComputer Science (R0)