Abstract
This work is about multimodal and expressive synthesis on virtual agents, based on the analysis of actions performed by human users. As input we consider the image sequence of the recorded human behavior. Computer vision and image processing techniques are incorporated in order to detect cues needed for expressivity features extraction. The multimodality of the approach lies in the fact that both facial and gestural aspects of the user’s behavior are analyzed and processed. The mimicry consists of perception, interpretation, planning and animation of the expressions shown by the human, resulting not in an exact duplicate rather than an expressive model of the user’s original behavior.
Similar content being viewed by others
References
Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256–274.
Baenziger, T., Pirker, H., & Scherer, K. (2006). Gemep - Geneva multimodal emotion portrayals: A corpus for the study of multimodal emotional expressions. In L. Devillers et al. (Eds.), Proceedings of LREC’06 Workshop on corpora for research on emotion and affect (pp. 15–19). Italy: Genoa.
Byun, M., & Badler, N. (2002). Facemote: Qualitative parametric modifiers for facial animations. In Symposium on Computer Animation, San Antonio, TX.
Caridakis, G., Castellano, G., Kessous, L., Raouzaiou, A., Malatesta, L., Asteriadis, S., & Karpouzis, K. (2007). Multimodal emotion recognition from expressive faces, body gestures and speech. In Proceedings of the 4th IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2007, Athens, Greece.
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Raouzaiou, A., & Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expressions recognition. In International Conference on Multimodal Interfaces (ICMI’06), Banff, Alberta, Canada, November 2–4, 2006.
Chartrand, T. L., Maddux, W., & Lakin, J. (2005). Beyond the perception-behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. In R. Hassin, J. Uleman, & J. A. Bargh (Eds.), The new unconscious (pp. 334–361). New York, NY: Oxford University Press.
Chi, D., Costa, M., Zhao, L., & Badler, N. (2000). The emote model for effort and shape. In ACM SIGGRAPH ’00, pp. 173–182, New Orleans, LA.
Donato, G., Bartlett, M., Hager, J., Ekman, P., & Sejnowski, T. (1999). Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10), 974–989.
Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. J. Power (Eds.), Handbook of cognition & emotion (pp. 301–320). New York: John Wiley.
Ekman, P., & Friesen, W. (1969). The repertoire of nonverbal behavioral categories – origins, usage, and coding. Semiotica, 1, 49–98.
Ekman, P. & Friesen, W. (1978). The facial action coding system. San Francisco, CA: Consulting Psychologists Press.
Hartmann, B., Mancini, M., & Pelachaud, C. (2002). Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In Computer Animation’02, Geneva, Switzerland. IEEE Computer Society Press.
Hartmann, B., Mancini, M., & Pelachaud, C. (2005a). Implementing expressive gesture synthesis for embodied conversational agents. In Gesture Workshop, Vannes.
Hartmann, B., Mancini, M., Buisine, S., & Pelachaud, C. (2005b). Design and evaluation of expressive gesture synthesis for embodied conversational agents. In AAMAS’05. Utretch.
Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., & Kollias, S. (2005). Emotion recognition through facial expression analysis based on a neurofuzzy network. Special Issue on Emotion: Understanding & Recognition, Neural Networks, 18(4), 423–435.
Juslin, P., & Scherer, K. (2005). Vocal expression of affect. In J. Harrigan, R. Rosenthal, & K. Scherer (Eds.), The new handbook of methods in nonverbal behavior research. Oxford, UK: Oxford University Press.
Kochanek, D. H., & Bartels, R. H. (1984). Interpolating splines with local tension, continuity, and bias control. In H. Christiansen (Ed.), Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH’ 84 (pp. 33–41). New York, NY: ACM. http://doi.acm.org/10.1145/800031.808575.
Kopp, S., Sowa, T., & Wachsmuth, I. (2003). Imitation games with an artificial agent: From mimicking to understanding shape-related iconic gestures. In Gesture Workshop, pp. 436–447.
Lakin, J., Jefferis, V., Cheng, C., & Chartrand, T. (2003). The Chameleon effect as social Glue: Evidence for the evolutionary significance of nonconscious mimicry. Journal of Nonverbal Behavior, 27(3), 145–162.
Martin, J.-C., Abrilian, S., Devillers, L., Lamolle, M., Mancini, M., & Pelachaud, C. (2005). Levels of representation in the annotation of emotion for the specification of expressivity in ECAs. In International Working Conference on Intelligent Virtual Agents, Kos, Greece, pp. 405–417.
Ong, S., & Ranganath, S. (2005). Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 873–891.
Oviatt, S. (1999). Ten myths of multimodal interaction. Communications of the ACM, 42(11), 74–81.
Pelachaud, C., & Bilvi, M. (2003). Computational model of believable conversational agents. In M.-P. Huget (Ed.), Communication in multiagent systems, Vol. 2650 of Lecture notes in Computer Science (pp. 300–317). Springer-Verlag.
Peters, C. (2005). Direction of attention perception for conversation initiation in virtual environments. In International Working Conference on intelligent virtual agents, Kos, Greece, pp. 215–228.
Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., & Kollias, S. (2002). Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing, 1(Jan), 1021–1038. http://dx.doi.org/10.1155/S1110865702206149
Rapantzikos, K., & Avrithis, Y. (2005). An enhanced spatiotemporal visual attention model for sports video analysis. In International Workshop on content-based Multimedia indexing (CBMI), Riga, Latvia.
Scherer, K., & Ekman, P. (1984). Approaches to emotion. Hillsdale: Lawrence Erlbaum Associates.
Tekalp, A., & Ostermann, J. (2000). Face and 2-d mesh animation in mpeg-4. Signal Processing: Image Communication, 15, 387–421.
van Swol, L. (2003) The effects of nonverbal mirroring on perceived persuasiveness, agreement with an imitator, and reciprocity in a group discussion. Communication Research, 30(4), 461–480.
Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690–699.
Wexelblat, A. (1995). An approach to natural gesture in virtual environments. ACM Transactions on Computer-Human Interaction, 2, 179–200.
Whissel, C. M. (1989). The dictionary of affect in language. In R. Plutchnik & H. Kellerman (Eds.), Emotion: Theory, research and experience: Vol. 4, The measurement of emotions. New York: Academic Press.
Williams G. W. (1976). Comparing the joint agreement of several raters with another rater. Biometrics, 32, 619–627.
Wu, Y., & Huang, T. (2001). Hand modeling, analysis, and recognition for vision-based human computer interaction. IEEE Signal Processing Magazine, 18, 51–60.
Wu, Y., & Huang, T. S. (1999). Vision-based gesture recognition: A review. In The 3rd gesture workshop, Gif-sur-Yvette, France, pp. 103–115.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Caridakis, G., Raouzaiou, A., Bevacqua, E. et al. Virtual agent multimodal mimicry of humans. Lang Resources & Evaluation 41, 367–388 (2007). https://doi.org/10.1007/s10579-007-9057-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9057-1