skip to main content
research-article

Learning behavior styles with inverse reinforcement learning

Published:26 July 2010Publication History
Skip Abstract Section

Abstract

We present a method for inferring the behavior styles of character controllers from a small set of examples. We show that a rich set of behavior variations can be captured by determining the appropriate reward function in the reinforcement learning framework, and show that the discovered reward function can be applied to different environments and scenarios. We also introduce a new algorithm to recover the unknown reward function that improves over the original apprenticeship learning algorithm. We show that the reward function representing a behavior style can be applied to a variety of different tasks, while still preserving the key features of the style present in the given examples. We describe an adaptive process where an author can, with just a few additional examples, refine the behavior so that it has better generalization properties.

Skip Supplemental Material Section

Supplemental Material

tp069-10.mp4

mp4

43 MB

References

  1. Abbeel, P., and Ng, A. Y. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning, ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abbeel, P., Dolgov, D., Ng, A., and Thrun, S. 2008. Apprenticeship learning for motion planning, with application to parking lot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE.Google ScholarGoogle Scholar
  3. Barber, C. B., Dobkin, D. P., and Huhdanpaa, H. 1995. The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software 22, 469--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Beaudoin, P., Coros, S., van de Panne, M., and Poulin, P. 2008. Motion-motif graphs. In Proceedings of the 2008 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bellman, R. E. 1957. Dynamic Programming. Princeton University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brand, M., and Hertzmann, A. 2000. Style machines. In Proceedings of SIGGRAPH 2000, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Coates, A., Abbeel, P., and Ng, A. Y. 2009. Apprenticeship learning for helicopter control. Communications of the ACM 52, 7, 97--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Funge, J., Tu, X., and Terzopoulos, D. 1999. Cognitive modeling: knowledge, reasoning and planning for intelligent characters. In Proceedings of SIGGRAPH 99, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Grochow, K., Martin, S. L., Hertzmann, A., and Popović, Z. 2004. Style-based inverse kinematics. ACM Transactions on Graphics 23, 3, 522--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Heck, R., and Gleicher, M. 2007. Parametric motion graphs. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, ACM, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hsu, E., Pulli, K., and Popović, J. 2005. Style translation for human motion. ACM Transactions on Graphics 24, 3, 1082--1089. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kovar, L., Gleicher, M., and Pighin, F. 2002. Motion graphs. In Proceedings of SIGGRAPH 2002, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 473--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lau, M., and Kuffner, J. J. 2006. Precomputed search trees: Planning for interactive goal-driven animation. In Proceedings of the 2006 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association, 299--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lee, J., and Lee, K. 2006. Precomputing avatar behavior from human motion data. Graphics Models 68, 2, 158--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, K. H., Choi, M. G., Hong, Q., and Lee, J. 2007. Group behavior from video: a data-driven approach to crowd simulation. In Proceedings of the 2007 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association, 109--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lee, Y., Lee, S., and Popović, Z. 2009. Compact character controllers. ACM Transaction on Graphics 28, 5, 169:1--169:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Liu, K., Hertzmann, A., and Popović, Z. 2005. Learning physics-based motion style with nonlinear inverse optimization. ACM Transactions on Graphics 24, 3 (Aug.), 1071--1081. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lo, W., and Zwicker, M. 2008. Real-time planning for parameterized human motion. In Proceedings of the 2008 Eurographics / ACM SIGGRAPH Symposium on Computer Animation, Eurographics Association, 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. McCann, J., and Pollard, N. 2007. Responsive characters from motion fragments. ACM Transactions on Graphics 26, 3 (July), 6:1--6:7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ng, A. Y., and Russell, S. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, 663--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Reitsma, P. S. A., and Pollard, N. S. 2004. Evaluating motion graphs for character navigation. In Proceedings of the 2004 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association, 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Russell, S. 1998. Learning agents for uncertain environments (extended abstract). In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, ACM Press, 101--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shin, H. J., and Oh, H. S. 2006. Fat graphs: constructing an interactive character with continuous controls. In Proceedings of the 2006 ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association, 291--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sung, M., Gleicher, M., and Chenney, S. 2004. Scalable behaviors for crowd simulation. Computer Graphics Forum 23, 3, 519--528.Google ScholarGoogle ScholarCross RefCross Ref
  25. Syed, U., Bowling, M., and Schapire, R. E. 2008. Apprenticeship learning using linear programming. In Proceedings of the 25th international conference on Machine learning, ACM, 1032--1039. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. ACM Transactions on Graphics 26, 3 (July), 7:1--7:7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ziebart, B. D., Maas, A., Bagnell, J. A., and Dey, A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd national conference on Artificial intelligence, AAAI Press, 1433--1438. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning behavior styles with inverse reinforcement learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 29, Issue 4
      July 2010
      942 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/1778765
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 July 2010
      Published in tog Volume 29, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader