Elsevier

Neural Networks

Volume 18, Issue 4, May 2005, Pages 389-405
Neural Networks

2005 Special Issue
Emotion recognition in human–computer interaction

https://doi.org/10.1016/j.neunet.2005.03.006Get rights and content

Abstract

In this paper, we outline the approach we have developed to construct an emotion-recognising system. It is based on guidance from psychological studies of emotion, as well as from the nature of emotion in its interaction with attention. A neural network architecture is constructed to be able to handle the fusion of different modalities (facial features, prosody and lexical content in speech). Results from the network are given and their implications discussed, as are implications for future direction for the research.

Introduction

As computers and computer-based applications become more and more sophisticated and increasingly involved in our everyday life, whether at a professional, a personal or a social level, it becomes ever more important that we are able to interact with them in a natural way, similar to the way we interact with other human agents. The most crucial feature of human interaction that grants naturalism to the process is our ability to infer the emotional states of others based on covert and/or overt signals of those emotional states. This allows us to adjust our responses and behavioural patterns accordingly, thus ensuring convergence and optimisation of the interactive process. This paper is based on the theoretical foundations of, and work carried out within, the collaborative EC project called ERMIS (for emotionally rich man–machine intelligent system), in which we have been involved recently. The aim of ERMIS is the development of a hybrid system capable of recognising people's emotions based on information from their faces and speech, both from the point of view of their prosodic and lexical content. We will develop in particular a neural network architecture and simulation demonstrating its recognition of emotions in speech and face stimuli. It will lead to open questions indicating further lines of enquiry.

The literature on emotions is rich and spans several disciplines, often with no obvious overlap or consolidating outlook. Our view of emotions has thus been shaped by the philosophy of Rene Descartes, the biological concepts of Charles Darwin and the psychological theories of William James, only to mention a few of the gurus of human sciences. Such theoretical concepts should be used as guidelines in putting together an automatic emotion recognition system (such as ERMIS) provided that they are shown to be relevant to more recent knowledge on emotions such as that stemming from the modern neurosciences. Indeed, recent technological advances have allowed us to probe the human brain and particularly the emotional circuitry that is involved in recognising emotions, which is yielding a more detailed understanding of the function and structure of emotion recognition in the brain. At the same time technological advances have significantly improved the signal processing techniques applied to the analysis of the physical correlates of emotions (such as the facial and vocal features) thus allowing efficient multi-modal emotion recognition interfaces to be built.

The possible applications of an interface capable of assessing human emotional states are numerous. One of the uses of such an interface is to enhance human judgement of emotion in situations where objectivity and accuracy are required. Lie detection is an obvious example of such situations, although improving on human performance would require a very effective emotion recognition system. Another example is clinical studies of schizophrenia and particularly the diagnosis of flattened affect that so far relies on the psychiatrists' subjective judgement of subjects' emotionality based on various physiological clues. An automatic emotion-sensitive system could augment these judgements, so minimising the dependence of the diagnostic procedure on individual psychiatrists' perception of emotionality. More generally along those lines, automatic emotion detection and classification can be used in a wide range of psychological and neuro-physiological studies of human emotional expression that so far rely on subjects' self-report of their emotional state, which often proves problematic. In a professional environment, enriching a teleconference session with real-time information on the emotional state of the participants could provide a substitute for the reduced naturalism of the medium, so again assisting humans in their emotional discriminatory capacity.

Another use of an emotion-sensitive system could be to embed it in an automatic tutoring application. An emotion-sensitive automatic tutor can interactively adjust the content of the tutorial and the speed at which it is delivered based on whether the user finds it boring and dreary or exciting and thrilling or even unapproachable and daunting. The system could recommend a brake when signs of weariness are detected. Similarly, emotion-sensitivity can be added to automatic customer services, call centres or personal assistants, for example, to help detect frustration and avoid further irritation, with the options to pass the interaction over to a human, or even terminate it altogether. One could also imagine an emotion-responsive car that can alert the driver when it detects signs of stress or anger that could impair their driving abilities.

The most obvious commercial application of emotion-sensitive systems is the game and entertainment industry with either interactive games that offer the sensation of naturalistic human-like interaction, or pets, dolls and so on that are sensitive to the owner's mood and can respond accordingly. Finally, owing to the shared basis of human emotion recognition and emotional expression, understanding and developing automatic systems for emotion recognition can assist in generating faces and/or voices endowed with convincingly human-like emotional qualities. This can in turn lead to a fully interactive system or agent that can perceive emotion and respond emotionally. This would thereby take human–machine interaction a step closer to human–human interaction.

In the sections that follow we will briefly review some of the prominent theories of emotions and the issues that arise from them. We will then turn to the more modern theoretical advances and experimental evidence and discuss issues that arise separately on the side of the sender and on the side of the receiver. After that we will explore the nature of the emotional features from the various modalities and discuss the available data for training and testing. Finally, we will present an artificial neural network architecture for fusing emotional information from the various modalities under attentional modulation and present the results obtained in the ERMIS framework through this neural network.

Section snippets

The psychological tradition

In our effort to construct an automatic emotion recogniser, it is important to examine the ideas proposed on the nature of emotions insofar as they shape the way emotional states are described. These ideas can guide us in determining what an emotional state is and what the relevant features are which distinguish this state from others. It is also crucial to delineate the nature of the mapping of these relevant features to the state's internal representation so that effective models of this

Training and testing material

An automatic emotion recognition system that employs learning architectures (e.g. neural networks), such as the one developed for ERMIS, requires sufficient training and testing material. This material should contain two streams: an input stream and an output stream. The input stream would comprise the extracted relevant features from the various modalities (prosody, faces, words, etc.) and the output would comprise the emotional class or category or more generally the emotional representation

The architecture

One of the most important effects of emotion is their ability to capture attention whether it is ‘bottom-up’ attention directed to stimuli or events that have been automatically registered as emotional, or it is ‘top-down’ attention re-engaged to a stimulus or event that has been evaluated as important to the current needs and goals after a cognitive appraisal mediated by a complex emotional–cognitive network. This emotion–attention interaction has been extensively discussed in the previous

General features of the analysis

A selection of SALAS sessions was analysed by the respective ERMIS partners who extracted the relevant features from the voice, faces and word stream. These sessions were also evaluated as to their emotional content by four subjects using the FEELTRACE program. The resulting streams of input and output data were in turn analysed by use of ANNA. The results are shown in Table 1, which gives the full set of ASSESS–FAPs–DAL training results, as well as the testing results.

To explain what has been

Conclusions

In this paper we have introduced the framework of the EC project ERMIS. The aim of this project was to build an automatic emotion recognition system able to exploit multimodal emotional markers such as those embedded in the voice, face and words spoken. We discussed the numerous potential applications of such a system for industry as well as in academia. We then turned to the psychological literature to help lay the theoretical foundation of our system and make use of insights from the various

Acknowledgements

We would like to acknowledge help from all of the partners in ERMIS, especially Roddie Cowie and Ellie Douglas-Cowie from QUB, as well as our colleagues from NTUA led by Stefanos Kollias. We would also like to acknowledge financial help from the EC under project ERMIS, under which this work was carried out.

References (33)

  • R. Adolphs

    Neural systems for recognizing emotion

    Current Opinion in Neurobiology

    (2002)
  • J.R. Averill

    A constructionist view of emotion

  • C.M. Whissell

    The dictionary of affect in language

  • R. Adolphs et al.

    Neural systems for recognition of emotional prosody: A 3-D lesion study

    Emotion

    (2002)
  • M.B. Arnold

    Emotion and personality

    (1960)
  • R. Banse et al.

    Acoustic profiles in vocal emotion expression

    Journal of Persolity and Social Psychology

    (1996)
  • C.S. Carver

    Affect and the functional bases of behavior: On the dimensional structure of affective experience

    Personality and Social Psychology Review

    (2001)
  • R. Cowie et al.

    Emotion recognition in human–computer interaction

    IEEE Signal Processing Magazine

    (2001)
  • A.R. Damasio

    Descartes' error: Emotion, reason, and the human brain

    (1994)
  • P. Ekman

    Face muscles talk every language

    Psychology Today

    (1975)
  • P. Ekman

    Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique

    Psychological Bulletin

    (1994)
  • P. Ekman

    All emotions are basic

  • P. Ekman et al.

    The facial action coding system

    (1978)
  • P. Ekman et al.

    Emotion in the human face: Guidelines for research and an integration of findings

    (1972)
  • P. Ellsworth

    Some reasons to expect universal antecedents of emotion

  • Fragopanagos, N., Kockelkoern, S., & Taylor, J. G. (2005). A neurodynamic model of the attentional blink. Cognitive...
  • Cited by (351)

    • Affective health and countermeasures in long-duration space exploration

      2022, Heliyon
      Citation Excerpt :

      During the recent COVID-19 pandemic, for instance, the effect of fearful emotions has been explored on personal expectations about future (Ceccato et al., 2021), as well as on buying/accumulation behaviors (Cannito et al., 2021). Finally, research on emotions and stress-related responses is of great interest in the fields of machine learning, robotics, and artificial intelligence (Cowie et al., 2001; Fragopanagos and Taylor, 2005). As we will state in the section about countermeasures, the emotion-technology pair is of great interest for adaptation during LDSE missions.

    • Directional Edge Coding for Facial Expression Recognition System

      2024, Lecture Notes in Networks and Systems
    View all citing articles on Scopus
    View full text