Hostname: page-component-76fb5796d-skm99 Total loading time: 0 Render date: 2024-04-26T12:04:34.068Z Has data issue: false hasContentIssue false

Speed adaptation for self-improvement of skills learned from user demonstrations

Published online by Cambridge University Press:  15 June 2015

Rok Vuga*
Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: bojan.nemec@ijs.si, ales.ude@ijs.si
Bojan Nemec
Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: bojan.nemec@ijs.si, ales.ude@ijs.si
Aleš Ude
Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: bojan.nemec@ijs.si, ales.ude@ijs.si
*
*Corresponding author. E-mail: rok.vuga@ijs.si

Summary

The paper addresses the problem of speed adaptation of movements subject to environmental constraints. Our approach relies on a novel formulation of velocity profiles as an extension of dynamic movement primitives (DMP). The framework allows for compact representation of non-uniformly accelerated motion as well as simple modulation of the movement parameters. In the paper, we evaluate two model free methods by which optimal parameters can be obtained: iterative learning control (ILC) and policy search based reinforcement learning (RL). The applicability of each method is discussed and evaluated on two distinct cases, which are hard to model using standard techniques. The first deals with hard contacts with the environment while the second process involves liquid dynamics. We find ILC to be very efficient in cases where task parameters can be easily described with an error function. On the other hand, RL has stronger convergence properties and can therefore provide a solution in the general case.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Initial results on the topic were presented at the IEEE-RAS International Conference on Humanoid Robots (Humanoids 2013), Atlanta, Georgia.1

References

1. Nemec, B., Gams, A. and Ude, A., “Velocity Adaptation for Self-Improvement of Skills Learned from User Demonstrations,” Proceedings of IEEE-RAS International Conference on Humanoid Robots, Humanoids 2013, Atlanta, Georgia, USA (2013).CrossRefGoogle Scholar
2. Wolpert, D. M., Diedrichsen, J. and Flanagan, J. R., “Principles of sensorimotor learning,” Nature Rev. Neurosci. 12 (12), 739751 (2011).CrossRefGoogle ScholarPubMed
3. Bentivegna, D. C., Atkeson, C. G. and Cheng, G., “Learning tasks from observation and practice,” Robot. Auton. Syst. 47 (2–3), 163169 (2004).CrossRefGoogle Scholar
4. Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y. and Kawato, M., “A kendama learning robot based on bi-directional theory,” Neural Netw. 9 (8), 12811302 (1996).Google Scholar
5. Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Netw. 21 (4), 682697 (2008).CrossRefGoogle ScholarPubMed
6. Stulp, F., Theodorou, E. and Schaal, S., “Reinforcement learning with sequences of motion primitives for robust manipulation,” IEEE Trans. Robot. 28 (6), 13601370 (2012).Google Scholar
7. Calinon, S., Guenter, F. and Billard, A., “On learning, representing, and generalizing a task in a humanoid robot,” Trans. Syst. Man Cybern. Part B 32 (2), 286298 (2007).Google Scholar
8. Kormushev, P., Calinon, S. and Caldwell, D. G., “Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input,” Adv. Robot. 25 (5), 581603 (2011).Google Scholar
9. Buchli, J., Stulp, F., Theodorou, E. and Schaal, S., “Learning variable impedance control,” Int. J. Robot. Res. 30 (7), 820833 (2011).Google Scholar
10. Hollerbach, J. M., “Dynamic Scaling of Manipulator Trajectories,” American Control Conference (Jun., 1983) pp. 752–756.Google Scholar
11. Bobrow, J. E., Dubowsky, S. and Gibson, J. S., “Time-optimal control of robotic manipulators along specified paths,” Int. J. Robot. Res. 4 (3), 317 (1985).CrossRefGoogle Scholar
12. Shin, K. and McKay, N. D., “Minimum-time control of robotic manipulators with geometric path constraints,” IEEE Trans. Autom. Control 30 (6), 531541 (Jun. 1985).Google Scholar
13. McCarthy, J. and Bobrow, J., “The Number of Saturated Actuators and Constraint Forces During Time-Optimal Movement of a General Robotic System,” Proceedings of 1992 IEEE International Conference on Robotics and Automation, vol. 1 (May 1992) pp. 542–546.Google Scholar
14. Žlajpah, L., “On Time Optimal Path Control of Manipulators with Bounded Joint Velocities and Torques,” Proceedings of 1996 IEEE International Conference on Robotics and Automation, Minneapolis, Minnesota (1996) pp. 1572–1577.Google Scholar
15. Dahl, O. and Nielsen, L., “Torque Limited Path Following by On-Line Trajectory Time Scaling,” Proceedings of 1989 IEEE International Conference on Robotics and Automation, vol. 2 (May 1989) pp. 1122–1128.Google Scholar
16. Dahl, O., “Path Constrained Robot Control with Limited Torques-Experimental Evaluation,” Proceedings of 1993 IEEE International Conference on Robotics and Automation, vol. 2 (May 1993) pp. 493–498.Google Scholar
17. Kieffer, J., Cahill, A. and James, M., “Robust and accurate time-optimal path-tracking control for robot manipulators,” IEEE Trans. Robot. Autom. 13 (6), 880890 (Dec. 1997).Google Scholar
18. Akella, S. and Peng, J., “Time-scaled Coordination of Multiple Manipulators,” Proceedings of 2004 IEEE International Conference on Robotics and Automation, ICRA '04, vol. 4 (Apr. 2004) pp. 3337–3344.Google Scholar
19. Michna, V., Wagner, P. and Cernohorsky, J., “Constrained Optimization of Robot Trajectory and Obstacle Avoidance,” Proceedings of 2010 IEEE Conference on Emerging Technologies and Factory Automation (ETFA) (Sep. 2010) pp. 1–4.Google Scholar
20. Zhao, Y. and Tsiotras, P., “Speed Profile Optimization for Optimal Path Tracking,” American Control Conference (ACC) (Jun. 2013) pp. 1171–1176.Google Scholar
21. Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P. and Schaal, S., “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Comput. 25 (2), 328373 (2013).CrossRefGoogle ScholarPubMed
22. Schaal, S., Mohajerian, P. and Ijspeert, A., “Dynamics systems versus optimal control – a unifying view,” Prog. Brain Res. 165 (6), 425445 (2007).Google Scholar
23. Moore, K. L., Chen, Y. and Ahn, H.-S., “Iterative Learning Control: A Tutorial and Big Picture View,” Proceedings of 2006 45th IEEE Conference on Decision and Control (Dec. 2006) pp. 2352–2357.CrossRefGoogle Scholar
24. Bristow, D., Tharayil, M. and Alleyne, A., “A survey of iterative learning control,” IEEE Trans. Control Syst. 26 (3), 96114 (Jun. 2006).Google Scholar
25. Kober, J., Bagnell, D. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Robot. Res. 32 (11), 12381274 (2013).Google Scholar
26. Theodorou, E. A., Buchli, J. and Schaal, S., “A generalized path integral control approach to reinforcement learning,” J. Mach. Learn. Res. 11 (11), 31373181 (2010).Google Scholar
27. Schreiber, G., Stemmer, A. and Bischoff, R., “The Fast Research Interface for the Kuka Lightweight Robot,” IEEE Workshop on Innovative Robot Control Architectures for Demanding (Research) Applications - How to Modify and Enhance Commercial Controllers (ICRA 2010) (May 2010) pp. 73–77.Google Scholar
28. Ude, A., Nemec, B., Petrič, T. and Morimoto, J., “Orientation in Cartesian Space Dynamic Movement Primitives,” International Conference on Robotics and Automation (ICRA), Hong Kong, China (2014) pp. 2997–3004.Google Scholar
29. Abend, W., Bizzi, E. and Morasso, P., “Human arm trajectory formation,” Brain 105 (2), 331348 (1982).CrossRefGoogle ScholarPubMed
30. Uno, Y., Kawato, M. and Suzuki, R., “Formation and control of optimal trajectory in human multijoint arm movement. Minimum torque-change model,” Biol. Cybern. 61 (2), 89101 (1989).Google Scholar
31. Flash, T. and Hogan, N., “The coordination of arm movements: an experimentally confirmed mathematical model,” J. Neurosci. 5 (7), 16881703 (1985).CrossRefGoogle ScholarPubMed
32. Harris, C. M. and Wolpert, D. M., “Signal-dependent noise determines motor planning,” Nature 394 (6695), 780784 (1998).Google Scholar
33. Todorov, E. and Jordan, M. I., “Optimal feedback control as a theory of motor coordination,” Nature Neurosci. 5 (11), 12261235 (2002).CrossRefGoogle ScholarPubMed
34. Hogan, N., “Adaptive control of mechanical impedance by coactivation of antagonist muscles,” IEEE Trans. Autom. Control 29 (8), 681690 (Aug. 1984).Google Scholar
35. Vuga, R., Nemec, B. and Ude, A., “Speed Profile Optimization through Directed Explorative Learning,” IEEE-RAS 14th International Conference on Humanoid Robots, HUMANOIDS (Nov. 2014) pp. 547–553.CrossRefGoogle Scholar