More than just a pretty face: conversational protocols and the affordances of embodiment

https://doi.org/10.1016/S0950-7051(00)00102-7Get rights and content

Abstract

Prior research into embodied interface agents has found that users like them and find them engaging. However, results on the effectiveness of these interfaces for task completion have been mixed. In this paper, we argue that embodiment can serve an even stronger function if system designers use actual human conversational protocols in the design of the interface. Communicative behaviors such as salutations and farewells, conversational turn-taking with interruptions, and describing objects using hand gestures are examples of protocols that all native speakers of a language already know how to perform and can thus be leveraged in an intelligent interface. We discuss how these protocols are integrated into Rea, an embodied, multi-modal interface agent who acts as a real-estate salesperson, and we show why embodiment is required for their successful implementation.

Introduction

There is a qualitative difference between face-to-face conversation and other forms of human–human communication [1]. Businesspeople and academics routinely travel long distances to conduct certain face-to-face interactions when electronic forms of communication would seemingly work just as well. When someone has something really important to say, they say it in person.

The qualitative difference in these situations is not just that we enjoy looking at humans more than at computer screens but also that the human body enables the use of certain communication protocols in face-to-face conversation which provide for a more rich and robust channel of communication than is afforded by any other medium available today. The use of gaze, gesture, intonation, and body posture play an essential role in the proper execution of many conversational functions — such as conversation initiation and termination, turn-taking, interruption handling, feedback and error correction — and these kinds of behaviors enable the exchange of multiple levels of information in real time. People are extremely adept at extracting meaning from subtle variations in the performance of these behaviors; for example slight variations in pause length, feedback nod timing or gaze behavior can significantly alter the interpretation of an utterance (consider “you did a great job” vs. “you did a … great job”).

Of particular interest to interface designers is that these communication protocols come for “free” in that users do not need to be trained in their use; all native speakers of a given language have these skills and use them daily. An embodied interface agent which exploits these protocols has the potential to provide a higher bandwidth of communication than would otherwise be possible.

Of course, depictions of human bodies are also more decorative than menus on a screen and, like any new interface design, they are also currently quite in vogue and therefore attractive to many users. Unfortunately, many embodied interface agents developed to date don't go further than their ornamental or novelty value. Aside from the use of pointing gestures and two or three facial expressions, an extensive wardrobe and a coyly cocked head, many animated interface agents provide little more than something amusing to look at while the same old system handles the mechanics of the interaction. It is no wonder that these systems have been found to be likable and engaging, but to provide little improvement in task performance over text or speech-only interfaces.

In this paper, we first review the embodied interface agents developed to date and summarize the results of evaluations performed on them. We then discuss several human communication protocols along with their interface utility and requirements for embodiment. Finally, we present Rea, an embodied interface agent which implements these protocols and describe our ongoing research program to develop embodied interface agents that leverage knowledge of human communication skills.

Section snippets

Related work

Other researchers have built embodied interface agents, with varying degrees of conversational ability. Olga is an embodied humanoid agent that allows the user to employ speech, keyboard and mouse commands to engage in a conversation about microwave ovens [2]. Olga has a distributed client–server architecture with separate modules for language processing, interaction management, direct manipulation interface output animation, all communicating through a central server. Olga is event driven, and

Human communication protocols requiring embodiment

Providing the interface with a body allows the system to engage in a wide range of multi-modal behaviors that, when executed in tight temporal synchronization with language, carry out a communicative function. It is important to understand that particular behaviors, such as the raising of the eyebrows, can be employed in a variety of circumstances to realize different communicative functions, and that the same communicative function may be realized through different sets of behaviors. It is

Rea: an embodied conversational agent

The Rea project at the MIT Media Lab [23], [24] has as its goal the construction of an embodied, multi-modal real-time conversational interface agent. Rea implements the conversational protocols described above, on the basis of the FEMBOT model, in order to make interactions as natural as face-to-face conversation with another person. In the current task domain, Rea acts as a real estate salesperson, answering user questions about properties in her database and showing users around virtual

Future work

Rea is still a little clumsy in conversation; perhaps not yet your real estate agent of choice. One line of research that we are pursuing is to increase the symmetry of input and output by beginning to sense more conversational protocols in the user, as well as generate more of those protocols in output. To this end, we have begun to develop a sensor to measure head movements and eye gaze using a separate vision system that will estimate what direction the user's face is pointing. This

Conclusions

In this paper we have argued that embodied interface agents can provide a qualitative advantage over non-embodied interfaces, if the bodies are used in ways that leverage knowledge of human communicative behavior. We demonstrated our approach with the Rea system. Increasingly capable of making an intelligent content-oriented — or propositional — contribution to the conversation in several modalities, Rea is also sensitive to the regulatory — or interactional — function of verbal and non-verbal

Acknowledgements

Thanks to the other members of the REA team — in particular David Mellis and Nina Yu — and to Jennifer Smith and Matthew Stone for their contribution to the work and comments on this paper. Thanks to Candy Sidner and several anonymous reviewers for helpful comments that improved the article. Research leading to the preparation of this article was supported by the National Science Foundation (award IIS-9618939), AT&T, Deutsche Telekom, and other generous sponsors of the MIT Media Lab.

References (33)

  • E Boyle et al.

    The effects of visibility in a cooperative problem solving task

    Language and Speech

    (1994)
  • J. Beskow, S. McGlashan, Olga: a conversational agent with gestures. IJCAI’97,...
  • A. Takeuchi, K. Nagao, Communicative facial displays as a new conversational modality. InterCHI’93, ACM Press,...
  • R. Brooks, A Robust Layered Control System for a Mobile Robot, MIT AI Lab, Cambridge, MA,...
  • J.C. Lester et al., Cosmo: a life-like animated pedagogical agent with deictic believability. IJCAI’97,...
  • J Rickel et al.

    Animated agents for procedural training in virtual reality: perception, cognition, and motor control

    Applied Artificial Intelligence

    (1999)
  • J. Cassell et al., Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for...
  • K Thorisson

    Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. MIT Media Laboratory Ph.D. thesis

    (1996)
  • S. Prevost et al., Face-to-face interfaces. CHI’99, ACM Press,...
  • W. Wahlster et al. Designing illustrated texts. EACL’91,...
  • N. Green et al. A media-independent content language for integrated text and graphics generation. Workshop on Content...
  • S Feiner et al.

    Automating the generation of coordinated multimedia explanations

    IEEE Computer

    (1991)
  • T. Koda, P. Maes, Agents with faces: the effects of personification of agents. Fifth IEEE International Workshop on...
  • A. Takeuchi, T. Naito, Situated facial displays: towards social interaction. Human Factors in Computing Systems:...
  • S Kiesler et al.

    Social human–computer interaction

  • E. Andre, T. Rist, J. Muller, Integrating reactive and scripted behaviors in a life-like presentation agent....
  • Cited by (69)

    • It's a circular argument: Examining how a novel configuration impacts information flow in submarine control rooms

      2021, Applied Ergonomics
      Citation Excerpt :

      The increases in productivity were more substantial than when operators were co-located but facing outward with no large shared displays or duplicate screens (Roberts & Stanton, 2019). It can be inferred that the shared awareness enabled by the capacity of operators to see the relatedness of traditionally disparate sonar information on large shared displays enabled operators to complete a greater volume of tasks (Banks and McKeran, 2005; Clark and Brennan, 1991; Cassell et al., 2001; MacMillan et al., 1999). The current work evaluated a novel circular control room configuration and compared it to a contemporary baseline during the completion of DT operations.

    • Animated Pedagogical Agents and Emotion

      2015, Emotions, Technology, Design, and Learning
    • Receptive to bad reception: Jerky motion can make persuasive messages more effective

      2014, Computers in Human Behavior
      Citation Excerpt :

      Elsewhere, animated agents and avatars have been found useful as aids in real-time 3D visualization and virtual shopping (Lee & Chung, 2005, 2008; Stock et al., 2008). Social responses may be strongest to computer interfaces that most closely emulate human appearance and behavior (Cassell, Bickmore, Campbell, Vilhjálmsson, & Yan, 2001; Cassell & Tartaro, 2007; MacDorman & Ishiguro, 2006). However, early research suggests virtual encounters will also become more complicated.

    View all citing articles on Scopus
    View full text