2005 Special IssueEmotion recognition in human–computer interaction
Introduction
As computers and computer-based applications become more and more sophisticated and increasingly involved in our everyday life, whether at a professional, a personal or a social level, it becomes ever more important that we are able to interact with them in a natural way, similar to the way we interact with other human agents. The most crucial feature of human interaction that grants naturalism to the process is our ability to infer the emotional states of others based on covert and/or overt signals of those emotional states. This allows us to adjust our responses and behavioural patterns accordingly, thus ensuring convergence and optimisation of the interactive process. This paper is based on the theoretical foundations of, and work carried out within, the collaborative EC project called ERMIS (for emotionally rich man–machine intelligent system), in which we have been involved recently. The aim of ERMIS is the development of a hybrid system capable of recognising people's emotions based on information from their faces and speech, both from the point of view of their prosodic and lexical content. We will develop in particular a neural network architecture and simulation demonstrating its recognition of emotions in speech and face stimuli. It will lead to open questions indicating further lines of enquiry.
The literature on emotions is rich and spans several disciplines, often with no obvious overlap or consolidating outlook. Our view of emotions has thus been shaped by the philosophy of Rene Descartes, the biological concepts of Charles Darwin and the psychological theories of William James, only to mention a few of the gurus of human sciences. Such theoretical concepts should be used as guidelines in putting together an automatic emotion recognition system (such as ERMIS) provided that they are shown to be relevant to more recent knowledge on emotions such as that stemming from the modern neurosciences. Indeed, recent technological advances have allowed us to probe the human brain and particularly the emotional circuitry that is involved in recognising emotions, which is yielding a more detailed understanding of the function and structure of emotion recognition in the brain. At the same time technological advances have significantly improved the signal processing techniques applied to the analysis of the physical correlates of emotions (such as the facial and vocal features) thus allowing efficient multi-modal emotion recognition interfaces to be built.
The possible applications of an interface capable of assessing human emotional states are numerous. One of the uses of such an interface is to enhance human judgement of emotion in situations where objectivity and accuracy are required. Lie detection is an obvious example of such situations, although improving on human performance would require a very effective emotion recognition system. Another example is clinical studies of schizophrenia and particularly the diagnosis of flattened affect that so far relies on the psychiatrists' subjective judgement of subjects' emotionality based on various physiological clues. An automatic emotion-sensitive system could augment these judgements, so minimising the dependence of the diagnostic procedure on individual psychiatrists' perception of emotionality. More generally along those lines, automatic emotion detection and classification can be used in a wide range of psychological and neuro-physiological studies of human emotional expression that so far rely on subjects' self-report of their emotional state, which often proves problematic. In a professional environment, enriching a teleconference session with real-time information on the emotional state of the participants could provide a substitute for the reduced naturalism of the medium, so again assisting humans in their emotional discriminatory capacity.
Another use of an emotion-sensitive system could be to embed it in an automatic tutoring application. An emotion-sensitive automatic tutor can interactively adjust the content of the tutorial and the speed at which it is delivered based on whether the user finds it boring and dreary or exciting and thrilling or even unapproachable and daunting. The system could recommend a brake when signs of weariness are detected. Similarly, emotion-sensitivity can be added to automatic customer services, call centres or personal assistants, for example, to help detect frustration and avoid further irritation, with the options to pass the interaction over to a human, or even terminate it altogether. One could also imagine an emotion-responsive car that can alert the driver when it detects signs of stress or anger that could impair their driving abilities.
The most obvious commercial application of emotion-sensitive systems is the game and entertainment industry with either interactive games that offer the sensation of naturalistic human-like interaction, or pets, dolls and so on that are sensitive to the owner's mood and can respond accordingly. Finally, owing to the shared basis of human emotion recognition and emotional expression, understanding and developing automatic systems for emotion recognition can assist in generating faces and/or voices endowed with convincingly human-like emotional qualities. This can in turn lead to a fully interactive system or agent that can perceive emotion and respond emotionally. This would thereby take human–machine interaction a step closer to human–human interaction.
In the sections that follow we will briefly review some of the prominent theories of emotions and the issues that arise from them. We will then turn to the more modern theoretical advances and experimental evidence and discuss issues that arise separately on the side of the sender and on the side of the receiver. After that we will explore the nature of the emotional features from the various modalities and discuss the available data for training and testing. Finally, we will present an artificial neural network architecture for fusing emotional information from the various modalities under attentional modulation and present the results obtained in the ERMIS framework through this neural network.
Section snippets
The psychological tradition
In our effort to construct an automatic emotion recogniser, it is important to examine the ideas proposed on the nature of emotions insofar as they shape the way emotional states are described. These ideas can guide us in determining what an emotional state is and what the relevant features are which distinguish this state from others. It is also crucial to delineate the nature of the mapping of these relevant features to the state's internal representation so that effective models of this
Training and testing material
An automatic emotion recognition system that employs learning architectures (e.g. neural networks), such as the one developed for ERMIS, requires sufficient training and testing material. This material should contain two streams: an input stream and an output stream. The input stream would comprise the extracted relevant features from the various modalities (prosody, faces, words, etc.) and the output would comprise the emotional class or category or more generally the emotional representation
The architecture
One of the most important effects of emotion is their ability to capture attention whether it is ‘bottom-up’ attention directed to stimuli or events that have been automatically registered as emotional, or it is ‘top-down’ attention re-engaged to a stimulus or event that has been evaluated as important to the current needs and goals after a cognitive appraisal mediated by a complex emotional–cognitive network. This emotion–attention interaction has been extensively discussed in the previous
General features of the analysis
A selection of SALAS sessions was analysed by the respective ERMIS partners who extracted the relevant features from the voice, faces and word stream. These sessions were also evaluated as to their emotional content by four subjects using the FEELTRACE program. The resulting streams of input and output data were in turn analysed by use of ANNA. The results are shown in Table 1, which gives the full set of ASSESS–FAPs–DAL training results, as well as the testing results.
To explain what has been
Conclusions
In this paper we have introduced the framework of the EC project ERMIS. The aim of this project was to build an automatic emotion recognition system able to exploit multimodal emotional markers such as those embedded in the voice, face and words spoken. We discussed the numerous potential applications of such a system for industry as well as in academia. We then turned to the psychological literature to help lay the theoretical foundation of our system and make use of insights from the various
Acknowledgements
We would like to acknowledge help from all of the partners in ERMIS, especially Roddie Cowie and Ellie Douglas-Cowie from QUB, as well as our colleagues from NTUA led by Stefanos Kollias. We would also like to acknowledge financial help from the EC under project ERMIS, under which this work was carried out.
References (33)
Neural systems for recognizing emotion
Current Opinion in Neurobiology
(2002)A constructionist view of emotion
The dictionary of affect in language
- et al.
Neural systems for recognition of emotional prosody: A 3-D lesion study
Emotion
(2002) Emotion and personality
(1960)- et al.
Acoustic profiles in vocal emotion expression
Journal of Persolity and Social Psychology
(1996) Affect and the functional bases of behavior: On the dimensional structure of affective experience
Personality and Social Psychology Review
(2001)- et al.
Emotion recognition in human–computer interaction
IEEE Signal Processing Magazine
(2001) Descartes' error: Emotion, reason, and the human brain
(1994)Face muscles talk every language
Psychology Today
(1975)
Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique
Psychological Bulletin
All emotions are basic
The facial action coding system
Emotion in the human face: Guidelines for research and an integration of findings
Some reasons to expect universal antecedents of emotion
Cited by (351)
Ensemble deep learning in speech signal tasks: A review
2023, NeurocomputingAffective health and countermeasures in long-duration space exploration
2022, HeliyonCitation Excerpt :During the recent COVID-19 pandemic, for instance, the effect of fearful emotions has been explored on personal expectations about future (Ceccato et al., 2021), as well as on buying/accumulation behaviors (Cannito et al., 2021). Finally, research on emotions and stress-related responses is of great interest in the fields of machine learning, robotics, and artificial intelligence (Cowie et al., 2001; Fragopanagos and Taylor, 2005). As we will state in the section about countermeasures, the emotion-technology pair is of great interest for adaptation during LDSE missions.
Emotional speaker identification using a novel capsule nets model
2022, Expert Systems with ApplicationsAn Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
2024, Wireless Personal CommunicationsDirectional Edge Coding for Facial Expression Recognition System
2024, Lecture Notes in Networks and Systems