Is happy better than sad even if they are both non-adaptive? Effects of emotional expressions of talking-head interface agents

https://doi.org/10.1016/j.ijhcs.2006.09.005Get rights and content

Abstract

Hedonic preference and contextual appropriateness are two general principles governing humans’ emotional expressions. People in general prefer perceiving and expressing positive emotions to negative ones, but also modulate their emotional expressions to be appropriate to social contexts. Although computer-based characters such as interface agents are able to express basic human emotions, they cannot yet automatically and effectively adapt their emotional expressions to the changing context. Would hedonic preference hold up without contextual appropriateness? A 2×2 mixed-design experiment (happy vs. sad expression by happy vs. sad context) (N=24) was conducted with a talking-head agent presenting happy and sad novels to users. Supporting the hedonic preference principle, results showed that although both happy and sad agents were non-adaptive to the varying emotional tone of the context, the happy agent elicited greater intent to consume the books, more positive evaluation of the book reviews, more positive attitudes towards the agent and the interface, and more positive user experience than the sad agent.

Introduction

Emotion is an essential factor in human psychology (Zajonc, 1997) for it conveys feelings and attitudes, regulates motivation, colors cognition, and affects performance (Martin, 1986; Forgas, 1995). Human–computer interaction is no exception for the role of emotion (Brave and Nass, 2003). The traditional notion of computers as being emotionless machines has given way to a view of the necessity and advantage of infusing computers with emotion (Picard, 1997). Expressing emotions on computers has taken the forefront of affective computing as it directly faces users. A major means of emotional expression on computers is expressions of computer-based characters such as interface agents.

Computer-animated characters have become a common form of embodying interface agents (Dehn and van Mulken, 2000). Talking heads, which combine computer-synthesized faces and computer-synthesized speech, in particular, are hailed as a lively and social representation of agents. Thanks to the advancement in facial animation, computer-synthesized faces are able to convincingly express emotions. “Baldi”, a computer-synthesized face developed by Massaro (1998) and colleagues, for example, can be readily manipulated in a GUI interface to show the six basic emotions as defined by Ekman (1992) (anger, disgust, fear, happiness, sadness, and surprise). Massaro (1998) proved that the emotional expressions of the face “Baldi” were correctly recognized by people at above-chance levels. Similarly, using a simple metal robotic face, Schiano et al. (2000) were also able to demonstrate beyond-chance-level recognition of emotion expressions of the metal face. By varying the geometric intensity level of a synthetic 2-D face, Bartneck and Reichenbach (2005) found that it did not require extremely high geometric intensity in the facial animation to achieve accurate recognition of basic emotions.

Computer-synthesized speech (also known as text-to-speech or TTS) has also advanced to effectively express emotions. By modeling the human auditory parameters in expressing emotions, Cahn (1990) demonstrated above-chance level recognition of six basic emotions in synthetic speech (speech with emotionally neutral sentences, without any face). Emotion modeling in synthetic speech has since developed substantially (e.g. Iida et al., 2000; Burkhardt, 2005) and can be effectively achieved through a systematic framework of speech parameter manipulations (Schröder, 2001).

When synthetic faces and speech are coupled to create a talking head and congruently convey the intended emotion, the resulting bi-modal talking head achieves even more effective emotional expressiveness (Massaro, 1998). Building upon these technological successes and going beyond assessing and confirming recognition accuracy of emotional expressions, recent research started to assess the effects of emotional expressions of agents on users. Most of these studies focused on the agent's emotional expression as a response to user events, with the general findings indicating preferences for emotional expressions and particularly either positive or emphatic emotions.

Prendinger et al. (2005) found when an animated agent of a mathematical game showed affective responses to users’ performance, users gave more positive evaluations of the game and the experience. Klein et al. (2002) found that their affect-support agent represented with text and buttons in a GUI alleviated users’ frustration and led them to play a subsequent game longer, compared to just letting the users vent their frustration or ignoring it. Using a synthetic voice as agent and pre-programming intentional mouse delay to induce users’ frustration, Partala and Surakka (2004) found that users’ affects as physiologically measured for smiling and frowning were most positive when the voice showed positively toned intervention to the mouse delay, secondly positive when it showed negatively toned intervention, and least positive when there was no intervention. Taking the inquiry of the effects of emotional expressions of agents one step further, Brave et al. (2005) manipulated agents’ facial expressions (happy or sad) as an emphatic response to the user's winning or losing in a computer-based blackjack card game or as a self-oriented response to the agent's own winning or losing. They found it was emphatic emotional responses that elicited more positive evaluations in terms of liking, trust, and judgment of the agent. Self-oriented emotional responses made little difference compared to no emotional responses.

These studies demonstrated positive effects of emotional expressions of agents and particularly when the agent's expression is congruent with and appropriate for the valence of the user's event. Specifically, the agent needs to emotionally react to the user. When the interaction event needs improvement, a positive tone fares better than a negative one. These results converge with the two general rules for emotional display of humans: contextual appropriateness and hedonic preference.

The dominant approach in emotional modeling is following the theories in human emotional psychology (Picard, 1997; Fabri et al., 2004; Gratch and Marsella, 2005). The applicability of basic human–human interaction rules for human–computer interaction has been supported by numerous studies from the “Computers are social actors” paradigm (Reeves and Nass, 1996; Nass and Moon, 2000). A series of studies have shown that people follow the same rules, often automatically and unconsciously, in their interaction with computers and interface representations as in their interaction with other people, in a range of domains including gender, personality, ethnicity, reciprocity, and politeness. Thus, rules for humans’ emotional expressions should apply to computer-based characters. The applicability in the emotion domain already received initial support from aforementioned studies, whose results aligned with the two general rules for human emotional expressions in social interactions.

Two general rules for emotional expressions are identified in the psychology literature: hedonic preference and contextual appropriateness. The hedonic preference principle refers to the human tendency to experience, express, and perceive positive emotions versus negative ones (Frijda, 1988; Myers and Diener, 1995). In social interaction, people like others who display positive emotions more than those who display negative emotions (Berridge, 1999). If a person shows positive emotion, he or she is perceived to be more attractive and appealing to work with than someone who shows negative emotion (Bell, 1978). Goldberg and Gorn (1987) found that commercials following a happy TV program were perceived to be more effective, were liked more, and were recalled better.

However, most people do not always express happiness in every context. Hedonic preference is constrained by contextual appropriateness. People are socialized to express emotions appropriate to specific social contexts (Ekman and Friesen, 1975; Rorty, 1985). In the US culture, for example, happiness is assumed to be the appropriate emotion to express in winning a political campaign, and sadness for a funeral (Graham et al., 1981). Emotional socialization is found to happen as early as in infancy during the facial play between mothers and 3- or 6-month-old infants (Malatesta and Haviland, 1982). The ability of expressing socially appropriate emotions is found to develop as children get older. In a study, for instance, the 10-year-olds were more likely to suggest modulating emotions in situations involving emotional management such as receiving a disliked gift than the 6- and 8-year-olds (Saarni, 1979). Special efforts are often taken to help emotionally and behaviorally disturbed children learn to express appropriate emotions, for example, through therapy (Pollak and Thoits, 1989).

The rule of contextual appropriateness seems to be well evidenced in the agent showing happiness or sadness for the user's winning or losing in a game (Brave et al., 2005). Understanding the nature and the valence of the user's interaction events is critical for guiding effective and appropriate emotional expressions of the agent (Klein et al., 2002). The rule of hedonic preference is also well reflected, for example, in the most positive responses to positive interventions (Partala and Surakka, 2004).

Detecting and understanding the nature of the human–computer interaction context is a necessary requirement for guiding appropriate selection and expression of emotions (Picard, 1997). A human–computer interaction context involving an interface agent at least includes the user, the agent, and the content involved in the interaction. The user's action, performance, and affective states are certainly the primary focus. An indispensable aspect of the user's reaction and judgment also hinges upon their appraisal of the content being displayed in the interface and handled by the agent and how the agent handles it. A priori content prepared for the agent such as news, stories, or product information is independent of users’ subsequent reactions. How the agent presents this content has critical impact on users’ reception and judgment of the information as well as of the agent. When building agents with emotionality for e-sales (McBreen and Jack, 2001; McBreen et al., 2001; Brahnam, 2005), for example, how the agent emotionally presents the information of the products or services will have critical impact on subsequent reactions and judgment from the users. Information and content in an e-retail shop or server can be enormous. A sales agent presenting books and their excerpts on a Web site like Amazon.com would face millions of books, and in the case of fictions, countless stories and varying emotional tones. Studies reviewed earlier either pre-programmed the emotional tone of the user's responses (e.g. mouse delay causing frustration, Partala and Surakka, 2004) or deal with easily detectable user actions such as winning or losing a game (Brave et al., 2005). Many applications have users make an effort to choose the intended emotions for the agents or the avatars (e.g. Takahashi et al., 2005). It would be impractical to manually pre-determine the emotional tone of each product. Emotional tones of many rich instances of information may not be simply indicated with buttons or drop-down menu. Of course, there is still the problem of reliably and automatically detecting the emotional tone of all the countless and changing interaction content.

Hence, the state of technology today can easily enable agents including synthetic faces and speech to automatically generate effective positive or negative emotions. But it cannot yet automatically and reliably detect the emotional tone of the real-time interaction content and then correspondingly stipulate appropriate emotions to be displayed. In other words, without manual programming or simple and predictable scenarios, agents can easily fulfill the rule of hedonic preference but fall short in automatically meeting contextual appropriateness. Then, an important practical question is whether positive and negative emotional expressions would be equally undesirable if contextual appropriateness cannot be met? Or would positive emotion be still preferable to negative emotion even if it were still not ideal? The experimental study presented in this paper aimed to address this question.

A research question was formulated as:

RQ: Do happy and sad emotional expressions of an interface agent differ in their effects on users when both types of emotional expressions do not adapt to the changing contextual tone?

This question also has important theoretical implication. That is, does hedonic preference hold up without contextual appropriateness? As hedonic valence is a primary dimension of emotions, it is important to study whether the hedonic preference is strong enough to overwhelm the social demand of contextual appropriateness. This question is difficult to examine with people or it is usually not applicable to people because people are socially bound to adapt their emotional expressions to contexts. Computer interface agents made this inquiry pertinent and also easier to study and manipulate.

Section snippets

Method

A 2 (happy vs. sad expression)×2 (happy vs. sad context) mixed-design experiment was conducted. The emotional expression of a computer-synthesized talking-head agent on a computer interface was varied as the between-participants factor. In one condition, the talking-head agent had constantly happy expression facially and vocally. In the other condition, the talking head had constantly sad expression facially and vocally. Happy and sad emotional expressions were chosen because they are two of

Results

First, repeated-measures ANOVA's were conducted with the emotional expression of the talking head and the emotional tone of the books as the independent factors and the consumption intent and then the evaluation of book reviews as the dependent factor. Supporting the hedonic preference principle, the emotional expression of the agent had a significant main effect, F(1,22)=8.08, p<.01: the happy expression achieved greater intent of consuming both the happy and the sad books than the sad

Discussion

The findings of this study consistently demonstrated that the constantly happy talking-head agent was preferred to the constantly sad one. Despite of the lack of adaptation to the changing emotional tone in the content of the book review, the happy agent achieved greater consumption intent, more positive product evaluation, more positive attitudes towards the agent (in terms of liking, trustworthiness, and competence) and the Web-site interface, and more positive user experience than the sad

References (39)

  • S. Brave et al.

    Emotion in human–computer interaction

  • Burkhardt, F., 2005. Emofilt: The simulation of emotional speech by prosody-transformation. In: Proceedings of...
  • J.E. Cahn

    The generation of affect in synthesized speech

    Journal of the American Voice I/O Society

    (1990)
  • P. Ekman

    An argument for basic emotions

    Cognition and Emotion

    (1992)
  • P. Ekman et al.

    Unmasking the Face

    (1975)
  • M. Fabri et al.

    Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

    International Journal of Virtual Reality

    (2004)
  • L. Feldman Barrett et al.

    The structure of current affect: controversies and emerging consensus

    Current Directions in Psychological Science

    (1999)
  • J.P. Forgas

    Mood and judgment: the affect infusion model (AIM)

    Psychological Bulletin

    (1995)
  • N.H. Frijda

    The laws of emotion

    American Psychologist

    (1988)
  • Cited by (0)

    View full text