Abstract
Realizing effective listening behavior in virtual humans has become a key area of research, especially as research has sought to realize more complex social scenarios involving multiple participants and bystanders. A human listener’s nonverbal behavior is conditioned by a variety of factors, from current speaker’s behavior to the listener’s role and desire to participate in the conversation and unfolding comprehension of the speaker. Similarly, we seek to create virtual humans able to provide feedback based on their participatory goals and their unfolding understanding of, and reaction to, the relevance of what the speaker is saying as the speaker speaks. Based on a survey of existing psychological literature as well as recent technological advances in recognition and partial understanding of natural language, we describe a model of how to integrate these factors into a virtual human that behaves consistently with these goals. We then discuss how the model is implemented into a virtual human architecture and present an evaluation of behaviors used in the model.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The forced choice obviously simplifies this decoding task for the observer but the use of gibberish makes it harder.
References
Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press.
Argyle, M., Lalljee, M., & Cook, M. (1968). The effects of visibility on interaction in a dyad. Human Relations, 21, 3–17.
Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79, 941–952.
Bevacqua, E., Pammi, S., Hyniewska, S. J., Schroder, M., & Pelachaud, C. (2010). Multimodal backchannels for embodied conversational agents. In Proceedings of the 10th International Conference on Intelligent Virtual Agents (pp. 194–200). Philadelphia: IVA.
Brunner, L. (1979). Smiles can be back channels. Journal of Personality and Social Psychology, 37(5), 728–734.
Callan, H., Chance, M., & Pitcairn, T. (1973). Attention and advertence in human groups. Social Science Information, 12, 27–41.
DeVault, D., Sagae, K., & Traum, D. (2011). Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialogue & Discourse, 2(1), 143–170.
Dittmann, A., & Llewellyn, L. (1968). Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9, 79–84.
Ekman, P. (1979). About brows: Emotional and conversational signals. In M. von Cranach, K. Foppa, W. Lepenies, & D. Ploog (Eds.), Human ethology (pp. 169–248). Cambridge: Cambridge University Press.
Ellsworth, P., Friedman, H., Perlick, D., & Hoyt, M. (1978). Some effects of gaze on subjects motivated to seek or to avoid social comparison. Journal of Experimental Social Pscyhology, 14, 69–87.
Friedman, H. S., & Riggio, R. E. (1981). Effect of individual differences in non-verbal expressiveness on transmission of emotion. Journal of Nonverbal Behavior, 6(2), 96–104.
Goffman, E. (1981). Forms of talk. Philadelphia: University of Pennsylvania Press.
Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.
Gratch, J., Wang, N., Gerten, J., Fast, E. & Duffy, R. (2007). Creating rapport with virtual agents. In Proceedings of the 7th International Conference on Intelligent Virtual Agents. Paris: IVA.
Gu, E. & Badler, N. (2006). Visual attention and eye gaze during multipartite conversations with distractions. In Proceedings of the 6th, International Conference on Intelligent Virtual Agents. Marina Del Rey: IVA.
Hanks, W. F. (1996). Language and communicative practices. Boulder: Westview Press.
Hartholt, A., Gratch, J., Weiss, L., & Team, T. G. (2009). At the virtual frontier: Introducing gunslinger, a multi-character, mixed-reality, story-driven experience. In Proceedings of the 9th International Conference on Intelligent Virtual Agents (pp. 500–501). Berlin/Heidelberg: Springer.
Heylen, D. (2005). Challenges ahead: Head movements and other social acts in conversations. In Social presence cues symposium. Hatfield: University of Hertfordshire.
Heylen, D., Kopp, S., Marsella, S., Pelachaud, C., & Vilhjlmsson, H., (2008). The next step towards a functional markup language. In Proceedings of the 8th International Conference on Intelligent Virtual Agents (pp. 270–280). Berlin/Heidelberg: Springer.
Ikeda, K. (2009). Triadic exchange pattern in multiparty communication: A case study of conversational narrative among friends. Language and Culture, 30(2), 53–65.
Jan, D., & Traum, D. R. (2007). Dynamic movement and positioning of embodied agents in multiparty conversations. In Proceedings of the 6th International Conference on Autonomous Agents and Multiagent Systems (pp. 59–66). Toronto: AAMAS.
Jónsdóttir, G. R., Gratch, J., Fast, E., Thórission, K. R. (2007) Fluid semantic back-channel feedback in dialogue: Challenges and progress. In Proceedings of the 7th International Conference on Intelligent Virtual Agents. Paris: IVA.
Kendon, A. (1972). Some relationships between body motion and speech. In A. Seigman & B. Pope (Eds.), Studies in dyadic communication (pp. 177–216). Elmsford, New York: Pergamon Press.
Kendon, A. (1990). Conducting interaction: Patterns of behavior in focused encounters. Cambridge: Cambridge University Press.
Kendon, A. (2002). Some uses of the head shake. Gesture, 2(36), 147–182.
Kok, I., & Heylen, D. (2011). Appropriate and inappropriate timing of listener responses from multiple perspectives. In H. Vilhjlmsson, S. Kopp, S. Marsella, & K. Thrisson (Eds.), Intelligent virtual agents. Lecture Notes in Computer Science (vol. 6895). Berlin/Heidelberg: Springer.
Kopp S., Allwood J., Grammer, K., Ahlsen, E., & Stocksmeier, T. Modeling embodied feedback with virtual humans. In Modeling communication with robots and virtual humans (Vol. 4930, pp. 18–37). Berlin/Heidelberg: Springer.
Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thrisson, K., & Vilhjlmsson, H. (2006). Towards a common framework for multimodal generation: The behavior markup language (Vol. 4133, pp. 205G217). Berlin/Heidelberg: Springer.
Lee, J., & Marsella, S. (2006). Nonverbal behavior generator for embodied conversational agents. In Proceedings of the 6th International Conference on Intelligent Virtual Agents (pp. 243–255). IVA: Marina del Rey.
Lee, J., & Marsella, S. (2010). Predicting speaker head nods and the effects of affective information. IEEE Transactions on Multimedia, 12(6), 552–562.
Maatman, R., Gratch, J., & Marsella, S. (2005). Natural behavior of a listening agent. In Proceedings of the 5th International Conference on Intelligent Virtual Agents (pp. 25–36). IVA: Kos. .
Marsella, S., & Gratch, J. (2009). EMA: A process model of appraisal dynamics. Cognitive Systems Research, 10(1), 70–90.
McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32(24), 855–878.
Morency, L.-P., de Kok, I., & Gratch, J. (2008). A probabilistic multimodal approach for predicting listener backchannels. In Proceedings of the 8th International Conference on Intelligent Virtual Agents (pp. 70–84). IVA: Tokyo.
Poppe, R., Truong, K., Reidsma, D., & Heylen, D. (2010). Backchannel strategies for artificial listeners. In J. Allbeck, N. Badler, T. Bickmore, C. Pelachaud, & A. Safonova (Eds.), Intelligent virtual agents. Lecture Notes in Computer Science (Vol. 6356, pp. 146–158). Berlin Heidelberg: Springer.
Smith, C. A., & Lazarus, R. (1990). Emotion and adaptation. In L. A. Pervin (Ed.), Handbook of personality: Theory & research (pp. 609–637). New York: Guilford Press.
Thiébaux, M., Marshall, A., Marsella, S., & Kallmann, M. (2008). Smartbody: Behavior realization for embodied conversational agents. In Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 151–158). Estoril: AAMAS.
Traum, D., DeVault, D., Lee, J., Wang, Z., & Marsella, S. (2012). Incremental dialogue understanding and feedback for multiparty, multimodal conversation. In Y. Nakano, M. Neff, A. Paiva, & M. Walker (Eds.), Intelligent virtual agents. Lecture Notes in Computer Science (Vol. 7502, pp. 275–288). Berlin/Heidelberg: Springer.
Traum, D., Marsella, S., Gratch, J., Lee, J., & Hartholt, A. (2008). Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In Proceedings of the 8th International Conference on Intelligent Virtual Agents (pp. 117–130). IVA: Tokyo.
Vertegaa, R., der Veer, G. C. V., & Vons, H. (2000). Effects of gaze on multiparty mediated communication. Proceedings of Graphics, Interface (pp. 95–102). New York: ACM Press.
Yngve, V. (1970). On getting a word in edgewise. In Papers from the 6th regional meeting (pp. 567–578). Chicago: Chicago Linguistic Society.
Acknowledgments
This work was sponsored by the U.S. Army Research, Development, and Engineering Command (RDECOM). The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. We also would like to thank our colleagues Drs. David DeVault, LP Morency and David Traum for all their help in implementing this model.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, Z., Lee, J. & Marsella, S. Multi-party, multi-role comprehensive listening behavior. Auton Agent Multi-Agent Syst 27, 218–234 (2013). https://doi.org/10.1007/s10458-012-9215-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-012-9215-8