Abstract
We present an algorithm which learns an online trajectory generator that can generalize over varying and uncertain dynamics. When the dynamics is certain, the algorithm generalizes across model parameters. When the dynamics is partially observable, the algorithm generalizes across different observations. To do this, we employ recent advances in supervised imitation learning to learn a trajectory generator from a set of example trajectories computed by a trajectory optimizer. In experiments in two simulated domains, it finds solutions that are nearly as good as, and sometimes better than, those obtained by calling the trajectory optimizer on line. The online execution time is dramatically decreased, and the off-line training time is reasonable.
Similar content being viewed by others
Notes
- 1.
The video of this can be found at: https://www.youtube.com/watch?v=r9o0pUIXV6w.
- 2.
References
Betts, J.T.: Survey of numerical methods for trajectory optimization. In: Journal of Guidance, Control, and Dynamics (1998)
Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics (2011)
Kim, B., Pineau, J.: Maximum mean discrepancy imitation learning. In: Robotics: Science and Systems (2013)
Atkeson, C.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Neural Information Processing Systems (1994)
Tedrake, R.: LQR-trees: feedback motion planning via sums of squares verification. In: International Journal of Robotics Research (2010)
Atkeson, C., Liu, C.: Trajectory-based dynamic programming, In: Modeling, Simulation, and Optimization of Bipedal Walking (2013)
Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning (2013)
Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: International Conference on Machine Learning (2014)
Mordatch, I., Todorov, E.: Combining the benefits of function approximation and trajectory optimization. In: Robotics: Science and Systems (2014)
Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. In: Robotics and Autonomous Systems (2009)
Bagnell, J.A.: An invitation to imitation. In Tech Report CMU-RI-TR-15-08, Robotics Institute, Carnegie Mellon University (2015)
Abbeel, P., Coates, A., Ng. A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. In: International Journal of Robotics Research (2010)
Ross, S., Melik-Barkhudarov, N., Shankar, K S., Wendel, A., Dey, D., Bagnell, J.A., Hebert, M.: Learning monocular reactive UAV control in cluttered natural environments. In International Conference on Robotics and Automation (2013)
Berg, J., Miller, S., Duckworth, D., Hu, H., Wan, A., Fu, X., Goldberg, K., Abbeel, P.: Superhuman performance of surgical tasks by robots using iterative Learning from human-guided demonstrations. In: International Conference on Robotics and Automation (2010)
Gretton, A., Borgwardt, K., Rasch, M., Schlkopf, B., Smola, A.: A kernel method for the two sample problem. In: Neural Information Processing Systems (2007)
Cristianini, N., Shawe-Taylor, J.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. In: ACM Computing Surveys (2009)
Betts, J.: SIAM Advances in Design and Control. Practical methods for optimal control using nonlinear programming. Society for Industrial and Applied Mathematics, Philadelphia (2001)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. In: Journal of Machine Learning Research (2011)
Gill, P.E., Murray, W., Saunders, M.A.: Snopt: an sqp algorithm for large-scale constrained optimization. In: SIAM Journal on Optimization (2002)
Levine, S., Wagener, N., Abbeel, P.: Learning contact-rich manipulator skills with guided policy search. In: International Conference on Automation and Control (2015)
Marchese, A.D., Tedrake, R., Rus, D.: Dynamics and trajectory optimization for a soft spatial fluidic elastomer manipulator. In: International Conference on Automation and Control (2015)
Dai, H., Valenzuela, A., Tedrake, R.: Whole-body motion planning with centroidal dynamics and full kinematics. In: International Conference on Humanoid Robots (2014)
Posa, M., Cantu, C., Tedrake, R.: A direct method for trajectory optimization of rigid bodies through contact. In: International Journal of Robotics Research (2014)
Stryk, O.V., Bulirsch, R.: Direct and indirect methods for trajectory optimization. Ann. Op. Res. 37, 357–373 (1992)
Boggs, P.T., Tolle, J.W.: Sequential quadratic programming. Acta Numerica 4, 1–51 (1995)
Tedrake, R.: Drake: a planning, control, and analysis toolbox for nonlinear dynamical systems (2014). http://drake.mit.edu
Daume, H., Langford, J., Marcu, D.: Search-based structured prediction. In: Machine Learning Journal (2009)
Perkins, T.J., Barto, A.G.: Lyapunov design for safe reinforcement learning. J. Mach. Learn. Res. 3, 803–832 (2002)
Acknowledgements
This work was supported in part by the NSF (grant 1420927). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from the ONR (grant N00014-14-1-0486), from the AFOSR (grant FA23861014135), and from the ARO (grant W911NF1410433).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Kim, B., Kim, A., Dai, H., Kaelbling, L., Lozano-Perez, T. (2018). Generalizing Over Uncertain Dynamics for Online Trajectory Generation. In: Bicchi, A., Burgard, W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-319-60916-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-60916-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60915-7
Online ISBN: 978-3-319-60916-4
eBook Packages: EngineeringEngineering (R0)