Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans

Grimes, David B.; Rao, Rajesh P. N.

doi:10.1007/978-3-642-00616-6_7

David B. Grimes²⁴ &
Rajesh P. N. Rao²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5436))

2331 Accesses
19 Citations

Abstract

A prerequisite for achieving brain-like intelligence is the ability to rapidly learn new behaviors and actions. A fundamental mechanism for rapid learning in humans is imitation: children routinely learn new skills (e.g., opening a door or tying a shoe lace) by imitating their parents; adults continue to learn by imitating skilled instructors (e.g., in tennis). In this chapter, we propose a probabilistic framework for imitation learning in robots that is inspired by how humans learn from imitation and exploration. Rather than relying on complex (and often brittle) physics-based models, the robot learns a dynamic Bayesian network that captures its dynamics directly in terms of sensor measurements and actions during an imitation-guided exploration phase. After learning, actions are selected based on probabilistic inference in the learned Bayesian network. We present results demonstrating that a 25-degree-of-freedom humanoid robot can learn dynamically stable, full-body imitative motions simply by observing a human demonstrator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Turing, A.: Computing machinery and intelligence. Mind 59, 433–460 (1950)
Article Google Scholar
McCarthy, J., Minsky, M., Rochester, N., Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence (1955)
Google Scholar
Meltzoff, A.N.: Elements of a developmental theory of imitation. In: The imitative mind: Development, evolution, and brain bases, pp. 19–41. Cambridge University Press, Cambridge (2002)
Chapter Google Scholar
Doya, K., Ishii, S., Pouget, A., Rao, R.P.N. (eds.): Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press, Cambridge (2007)
Google Scholar
Rao, R.P.N., Olshausen, B.A., Lewicki, M.S. (eds.): Probabilistic Models of the Brain: Perception and Neural Function, Perception and Neural Function. MIT Press, Cambridge (2002)
Google Scholar
Rao, R.P.N., Shon, A.P., Meltzoff, A.N.: A Bayesian model of imitation in infants and robots. In: Imitation and Social Learning in Robots, Humans, and Animals. Cambridge University Press, Cambridge (2005)
Google Scholar
Kuniyoshi, Y., Inaba, M., Inoue, H.: Learning by watching: Extracting reusable task knowledge from visual observation of human performance. Transaction on Robotics and Automation 10(6), 799–822 (1994)
Article Google Scholar
Takahashi, Y., Hikita, K., Asada, M.: Incremental purposive behavior acquisition based on self-interpretation of instructions by coach. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp. 686–693. IEEE Computer Society Press, Los Alamitos (2003)
Google Scholar
Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. The Neuroscience of Social Interaction 1(1431), 199–218 (2004)
Google Scholar
Inamura, T., Toshima, I., Nakamura, Y.: Acquiring motion elements for bi-directional computation of motion recognition and generation. In: Experimental Robotics VIII, pp. 372–381. Springer, Heidelberg (2003)
Chapter Google Scholar
Ijspeert, A.J., Nakanishi, J., Schaal, S.: Trajectory formation for imitation with nonlinear dynamical systems. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), pp. 752–757. IEEE Press, Los Alamitos (2001)
Google Scholar
Billard, A., Mataric, M.: Learning human arm movements by imitation: Evaluation of a biologically-inspired connectionist architecture. Robotics and Autonomous Systems 37(941), 145–160 (2001)
Article Google Scholar
Calinon, S., Guenter, F., Billard, A.: On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation 37(2), 286–298 (2007)
Article Google Scholar
Demiris, J., Hayes, G.: A robot controller using learning by imitation. In: Proceedings of the 2nd International Symposium on Intelligent Robotic Systems (IROS 1994). IEEE Press, Los Alamitos (1994)
Google Scholar
Schaal, S.: Learning from demonstration. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9 (NIPS 1996), vol. 9, p. 1040. MIT Press, Cambridge (1997)
Google Scholar
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 12–20 (1997)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Price, B.: Accelerating Reinforcement Learning with Imitation. PhD thesis, University of British Columbia (2003)
Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 663–670 (2000)
Google Scholar
Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning (ICML 2005) (2005)
Google Scholar
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cognitive Science 3(6), 233–242 (1999)
Article CAS Google Scholar
Calinon, S., Guenter, F., Billard, A.: Goal-directed imitation in a humanoid robot. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2005). IEEE Press, Los Alamitos (2005)
Google Scholar
Webots: Commercial Mobile Robot Simulation Software, http://www.cyberbotics.com
Featherstone, R.: Robot Dynamics Algorithms. Springer, Heidelberg (1987)
Google Scholar
Luh, J.Y.S., Walker, M.W., Paul, R.P.C.: On-line computational scheme for mechanical manipulators. Dynamic Systems Measurement and Control 102 (1980)
Google Scholar
Chang, K.S., Khatib, O.: Efficient algorithm for extended operational space inertia matrix. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1999). IEEE Press, Los Alamitos (1999)
Google Scholar
Marhefka, D., Orin, D.: Simulation of contact using a nonlinear damping model. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 1996). IEEE Press, Los Alamitos (1996)
Google Scholar
Lotstedt, P.: Numerical simulation of time-dependent contact friction problems in rigid body mechanics. SIAM Journal on Scientific Statistical Computing 5(2), 370–393 (1984)
Article Google Scholar
Stewart, D., Trinkle, J.: An implicit time-stepping scheme for rigid body dynamics with coulomb friction. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2000). IEEE Press, Los Alamitos (2000)
Google Scholar
Kuffner, J.J., Nishiwaki, K., Kagami, S., Inaba, M., Inoue, H.: Motion planning for humanoid robots under obstacle and dynamic balance constraints. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 2001), pp. 692–698. IEEE Press, Los Alamitos (2001)
Google Scholar
Frank, A.A., McGhee, R.B.: Some considerations realation to the design of autopilots for legged vehicles. Terramechanics 6, 23–25 (1969)
Article Google Scholar
Vukobratovic, M., Borovac, B.: Zero-moment point - thirty five years of its life. International Journal of Humanoid Robotics 1(1), 157–173 (2004)
Article Google Scholar
Park, J., Rhee, Y.: ZMP trajectory generation for reduced trunk motions of biped robots. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1998). IEEE Press, Los Alamitos (1998)
Google Scholar
Huang, Q., Kajita, S., Koyachi, N., Kaneko, K., Yokoi, K., Arai, H., Komoriya, K., Tanie, K.: A high stability, smooth walking pattern for a biped robot. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 1999). IEEE Press, Los Alamitos (1999)
Google Scholar
Kagami, S., Kanehiro, F., Tamiya, Y., Inaba, M., Inoue, H.: Autobalancer: an online dynamic balance compensation scheme for humanoid robots. In: Proceedings of the International Workshop on Algorithmic Foundation of Robotics, pp. 329–340 (2000)
Google Scholar
Park, J., Kim, K.: Biped robot walking using gravity-compensated inverted pendulum mode and computed torque control. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 1998). IEEE Press, Los Alamitos (1998)
Google Scholar
Yamaguchi, Takanishi, A., Kato, I.: Development of a biped walking robot compensating for three-axis moment by trunk motion. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1993), pp. 561–566. IEEE Press, Los Alamitos (1993)
Chapter Google Scholar
Yamane, K., Nakamura, Y.: Dynamics filter - concept and implementation of on-line motion generator for human figures. IEEE Transactions on Robotics and Automation 19(3), 421–432 (2003)
Article Google Scholar
Ko, J., Klein, D., Fox, D., Hahnel, D.: GP-UKF: Unscented Kalman filters with gaussian process prediction and observation models. In: Proceedings of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2007). IEEE Press, Los Alamitos (2007)
Google Scholar
Shon, A.P., Verma, D., Rao, R.P.N.: Active imitation learning. In: Proceedings of the American Association for Artificial Intelligence (AAAI 2007) (2007)
Google Scholar
Barbic, J., Safonova, A., Pan, J.Y., Faloutsos, C., Hodgins, J.K., Pollard, N.S.: Segmenting motion capture data into distinct behaviors. In: Proceedings of Graphics Interface (GI 2004), University of Waterloo, Waterloo, Ontario, Canada, Canadian Human-Computer Communications Society, pp. 185–194 (2004)
Google Scholar
Muller, M., Roder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation (SCA 2006), Aire-la-Ville, Switzerland, Eurographics Association, pp. 137–146 (2006)
Google Scholar
Seth, A., Pandy, M.G.: A nonlinear tracking method of computing net joint torques for human movement. In: Proceedings of the 26th Annual International Conference of the Engineering in Medicine and Biology Society (2004)
Google Scholar
Sung, H.G.: Gaussian Mixture Regression and Classification. PhD thesis, Rice University (2004)
Google Scholar
Welling, M., Kurihara, K.: Bayesian K-means as a Maximization-Expectation algorithm. In: Proceedings of the SIAM conference on Data Mining (2005)
Google Scholar
Scott, D., Szewczyk, W.: From kernels to mixtures. Technometrics 43(3), 323–335 (2001)
Article Google Scholar
Kreutz, M., Reimetz, A.M., Sendhoff, B., Weihs, C., von Seelen, W.: Structure optimization of density estimation models applied to regression problems with dynamic noise. In: Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, pp. 237–242. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT Press, Cambridge (2001)
Google Scholar
Park, J.D., Darwiche, A.: Complexity results and approximation strategies for map explanations. Journal of Artififical Intelligence Research (JAIR) 21, 101–133 (2004)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Computation 12(1), 1–41 (2000)
Article CAS PubMed Google Scholar
Sudderth, E.B., Ihler, A.T., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2003), pp. 605–612 (2003)
Google Scholar
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001)
Article Google Scholar
Carreira-Perpinan, M.A.: Mode-finding for mixtures of gaussian distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(11), 1318–1323 (2000)
Article Google Scholar
Hwang, J., Lay, S., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. IEEE Transactions on Signal Processing 42(10), 2795–2810 (1994)
Article Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)
Book Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
Google Scholar
Vicon: Vicon MX Motion Capture System, http://www.vicon.com
Lawrence, N.D.: Gaussian process latent variable models for visualization of high dimensional data. In: Advances in Neural Information Processing Systems 15 (NIPS 2002). MIT Press, Cambridge (2003)
Google Scholar
Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. In: Proceedings of the ACM Transactions on Graphics, SIGGRAPH 2004 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Washington, Seattle, WA 98195, USA
David B. Grimes & Rajesh P. N. Rao

Authors

David B. Grimes
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh P. N. Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Honda Research Institute Europe GmbH, 63073 Offenbach/Main, Germany
Bernhard Sendhoff
Honda Research Institute Europe GmbH, Carl-Legien-Strasse 30, 63073, Offenbach/Main, Germany
Edgar Körner
Dept. of Psychological and Brain Sciences, Indiana University, IN 47405, Bloomington, USA
Olaf Sporns
Faculty of Technology, Neuroinformatics Group, Bielefeld University, Universitätsstr. 25, 33615, Bielefeld, Germany
Helge Ritter
Okinawa Institute of Science and Technology, Neural Computation Unit,, 12-22 Suzaki, Uruma, 904-2234, Okinawa, Japan
Kenji Doya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grimes, D.B., Rao, R.P.N. (2009). Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans. In: Sendhoff, B., Körner, E., Sporns, O., Ritter, H., Doya, K. (eds) Creating Brain-Like Intelligence. Lecture Notes in Computer Science(), vol 5436. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00616-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-00616-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00615-9
Online ISBN: 978-3-642-00616-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics