How to Train Your Avatar: A Data Driven Approach to Gesture Generation

Chiu, Chung-Cheng; Marsella, Stacy

doi:10.1007/978-3-642-23974-8_14

Chung-Cheng Chiu²³ &
Stacy Marsella²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6895))

Included in the following conference series:

International Workshop on Intelligent Virtual Agents

3211 Accesses

Abstract

The ability to gesture is key to realizing virtual characters that can engage in face-to-face interaction with people. Many applications take an approach of predefining possible utterances of a virtual character and building all the gesture animations needed for those utterances. We can save effort on building a virtual human if we can construct a general gesture controller that will generate behavior for novel utterances. Because the dynamics of human gestures are related to the prosody of speech, in this work we propose a model to generate gestures based on prosody. We then assess the naturalness of the animations by comparing them against human gestures. The evaluation results were promising, human judgments show no significant difference between our generated gestures and human gestures and the generated gestures were judged as significantly better than real human gestures from a different utterance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Creating a Gesture-Speech Dataset for Speech-Based Automatic Gesture Generation

Optimized Conversational Gesture Generation with Enhanced Motion Feature Extraction and Cascaded Generator

Literature Review of Audio-Driven 2D Avatar Video Generation Algorithms

References

http://ict.usc.edu/projects/gunslinger
http://ict.usc.edu/projects/responsive_virtual_human_museum_guides/
http://www.youtube.com/watch?v=OsZ0RI9JH60
Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5, 341–345 (2001)
Google Scholar
Brand, M.: Voice puppetry. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1999, pp. 21–28. ACM Press, New York (1999)
Chapter Google Scholar
Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 1075–1086 (2007)
Article Google Scholar
Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH 2001: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 477–486. ACM, New York (2001)
Chapter Google Scholar
Chiu, C.C., Marsella, S.: A style controller for generating virtual human behaviors. In: Proceedings of the 10th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2011, vol. 1 (2011)
Google Scholar
Ennis, C., McDonnell, R., O’Sullivan, C.: Seeing is believing: body motion dominates in multisensory conversations. In: ACM SIGGRAPH 2010 papers, SIGGRAPH 2010, pp. 91:1–91:9. ACM, New York (2010)
Google Scholar
Hinton, G.: A practical guide to training restricted boltzmann machines. UTML TR 2010003, Department of Computer Science, University of Toronto (August 2010)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79(8), 2554–2558 (1982)
Article MathSciNet Google Scholar
Krauss, R.M., Chen, Y., Gottesman, R.F.: Lexical gestures and lexical access: a process model. In: McNeill, D. (ed.) Language and Gesture. Cambridge University Press, Cambridge (2000)
Google Scholar
Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area v2. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 873–880. MIT Press, Cambridge (2008)
Google Scholar
Lee, J., Marsella, S.C.: Nonverbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)
Chapter Google Scholar
Levine, S., Krähenbühl, P., Thrun, S., Koltun, V.: Gesture controllers. In: ACM SIGGRAPH 2010 papers, SIGGRAPH 2010, pp. 124:1–124:11. ACM, New York (2010)
Google Scholar
Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. ACM Trans. Graph 28, 172:1–172:10 (2009), http://doi.acm.org/10.1145/1618452.1618518
Google Scholar
Neff, M., Kipp, M., Albrecht, I., Seidel, H.-P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph 27(1), 1–24 (2008)
Article Google Scholar
Sargin, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(8), 1330–1345 (2008)
Article Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. In: SIGGRAPH 2004: ACM SIGGRAPH 2004 Papers, pp. 506–513. ACM, New York (2004)
Chapter Google Scholar
Taylor, G., Hinton, G.: Factored conditional restricted Boltzmann machines for modeling motion style. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, pp. 1025–1032. Omnipress, Montreal (2009)
Google Scholar
Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1345–1352. MIT Press, Cambridge (2007)
Google Scholar
Valbonesi, L., Ansari, R., McNeill, D., Quek, F., Duncan, S., McCullough, K.E., Bryll, R.: Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures. In: Proc. of the European Signal Processing Conference, EUSIPCO 2002, pp. 75–78 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Creative Technologies, University of Southern California, 12015 Waterfront Drive, Playa Vista, CA, USA, 90094
Chung-Cheng Chiu & Stacy Marsella

Authors

Chung-Cheng Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Stacy Marsella
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Menntavegur 1, 101, Reykjavík, Iceland
Hannes Högni Vilhjálmsson
Bielefeld University, CITEC, P.O. Box 100131, 33501, Bielefeld, Germany
Stefan Kopp
Institute for Creative Technologies, University of Southern California, 12015 Waterfront Drive, 90094-2536, Playa Vista, CA, USA
Stacy Marsella
Reykjavik University, CADIA, Menntavegur 1, 101, Reykjavik, Iceland
Kristinn R. Thórisson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiu, CC., Marsella, S. (2011). How to Train Your Avatar: A Data Driven Approach to Gesture Generation. In: Vilhjálmsson, H.H., Kopp, S., Marsella, S., Thórisson, K.R. (eds) Intelligent Virtual Agents. IVA 2011. Lecture Notes in Computer Science(), vol 6895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23974-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-23974-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23973-1
Online ISBN: 978-3-642-23974-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics