Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech

Xu, Jianfeng; Nagai, Yuki; Takayama, Shinya; Sakazawa, Shigeyuki

doi:10.1007/978-3-319-27543-7_1

Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech

Jianfeng Xu¹⁹,
Yuki Nagai¹⁹,
Shinya Takayama¹⁹ &
…
Shigeyuki Sakazawa¹⁹

Chapter
First Online: 06 January 2016

426 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9420))

Abstract

Embodied agents have great potential for education field, which are promising to maximize the learner’s learning gains and enjoyment. In many education applications, multimodal representation of embodied agents is a powerful approach for obtaining the above benefit, which requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization method. Our case study reveals that two issues (i.e. duration and timing) play an important role in synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we employ a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the gesture in a second step. Subjective evaluation has demonstrated that the proposed method achieves more accurate synchronization with respect to both duration and timing, and higher motion quality than the state-of-the-art methods.

Furthermore, we have implemented the proposed synchronization method in an authoring tool for education applications. We have conducted several experiments in a university, whose results have demonstrated that our system makes the creation of attractive animations easier and faster (only about 10 % operation time) than manual creation of equal quality, and it is effective to use embodied agents in education applications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A growth point is assumed to be a minimal psychological unit with special focus on speech-gesture synchrony and co-expressivity.
2.
T-test is most commonly applied to determine if two sets of data are significantly different from each other when the samples follow a normal distribution.
3.
A lower p value means a higher probability that a significant difference exists.
4.
In this paper, the so called MikuMikuDance has two meanings. First, it may mean the authoring tool used in Sect. 3.1. Second, it may mean the specifications for mesh model, motion, and other data in animation.

References

Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)
Article MATH Google Scholar
Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)
Article MATH Google Scholar
Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008)
Google Scholar
Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004)
Google Scholar
Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999)
Google Scholar
Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000)
Google Scholar
Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001)
Google Scholar
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001)
Google Scholar
Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002)
Google Scholar
Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006)
Google Scholar
Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003)
Google Scholar
Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011)
Chapter Google Scholar
Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012)
Chapter Google Scholar
Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)
Article Google Scholar
Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006)
Chapter Google Scholar
Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)
Article Google Scholar
Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002)
Google Scholar
Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004)
Google Scholar
van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001)
Google Scholar
Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005)
Google Scholar
Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013)
Google Scholar
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)
Article Google Scholar
McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)
Book Google Scholar
McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)
Article Google Scholar
Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)
Article Google Scholar
Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001)
Google Scholar
Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)
Article Google Scholar
Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010)
Google Scholar
Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009)
Google Scholar
Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)
Book Google Scholar
Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)
Article Google Scholar
Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012)
Google Scholar
Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013)
Google Scholar
Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)
Article Google Scholar
Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)
Article Google Scholar
Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)
Chapter Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Article Google Scholar
Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)
Article Google Scholar
Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985)
Google Scholar
Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010)
Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)
Article Google Scholar
Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)
Article Google Scholar
Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003)
Google Scholar
Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013)
Google Scholar
Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011)
Google Scholar
Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)
Article Google Scholar

Download references

Acknowledgments

All the participants in our experiments, especially Prof. Shirotomo Aizawa and his students in Nagoya University of Arts and Sciences, Japan, are greatly appreciated.

Author information

Authors and Affiliations

Smart Home and Robot Application Laboratory, KDDI R&D Laboratories, Inc., 2-1-15 Ohara, Fujimino-shi, Saitama, 356-8502, Japan
Jianfeng Xu, Yuki Nagai, Shinya Takayama & Shigeyuki Sakazawa

Authors

Jianfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuki Nagai
View author publications
You can also search for this author in PubMed Google Scholar
Shinya Takayama
View author publications
You can also search for this author in PubMed Google Scholar
Shigeyuki Sakazawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Xu .

Editor information

Editors and Affiliations

Faculty of Computer Science and Manageme, Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Swinburne University of Technology, Hawthorn, Australia
Ryszard Kowalczyk
LERIA - UFR Sciences, Angers Cedex 01, France
Béatrice Duval
Leiden, Zuid-Holland, Leiden University, Leiden, The Netherlands
Jaap van den Herik
LERIA - UFR Sciences, Angers Cedex 01, France
Stephane Loiseau
INSTICC, Polytechnic Institute of Setubal, Setubal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, J., Nagai, Y., Takayama, S., Sakazawa, S. (2015). Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech. In: Nguyen, N., Kowalczyk, R., Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Transactions on Computational Collective Intelligence XX . Lecture Notes in Computer Science(), vol 9420. Springer, Cham. https://doi.org/10.1007/978-3-319-27543-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-27543-7_1
Published: 06 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27542-0
Online ISBN: 978-3-319-27543-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics