A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

Lee, Jonathan N.; Laskey, Michael; Tanwani, Ajay Kumar; Aswani, Anil; Goldberg, Ken

doi:10.1007/978-3-030-44051-0_13

Jonathan N. Lee¹⁴,
Michael Laskey¹⁴,
Ajay Kumar Tanwani¹⁴,
Anil Aswani¹⁴ &
…
Ken Goldberg¹⁴

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 14))

Included in the following conference series:

International Workshop on the Algorithmic Foundations of Robotics

954 Accesses

Abstract

On-policy imitation learning algorithms such as Dagger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for Dagger. Cheng and Boots (2018) consider the more realistic model for robotics where the underlying trajectory distribution, which is a function of the policy, is dynamic and show that it is possible to prove convergence when a condition on the rate of change of the trajectory distributions is satisfied. In this paper, we reframe that result using dynamic regret theory from the field of Online Optimization to prove convergence to locally optimal policies for Dagger, Imitation Gradient, and Multiple Imitation Gradient. These results inspire a new algorithm, Adaptive On-Policy Regularization (AOR), that ensures the conditions for convergence. We present simulation results with cart-pole balancing and walker locomotion benchmarks that suggest AOR can significantly decrease dynamic regret and chattering. To our knowledge, this the first application of dynamic regret theory to imitation learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrew Bagnell, J.: An invitation to imitation. Technical report, Carnegie Mellon University, Robotics Institute, Pittsburgh (2015)
Google Scholar
Cheng, C.-A., Boots, B.: Convergence of value aggregation for imitation learning. In: International Conference on Artificial Intelligence and Statistics (2018)
Google Scholar
Cheng, C.-A., Yan, X., Theodorou, E., Boots, B.: Accelerating imitation learning with predictive models. arXiv preprint arXiv:1806.04642 (2018)
Cheng, C.-A., Yan, X., Wagener, N., Boots, B.: Fast policy learning through imitation and reinforcement. In: Conference on Uncertainty in Artificial Intelligence (2018)
Google Scholar
Hall, E.C., Willett, R.M.: Online convex optimization in dynamic environments. IEEE J. Sel. Top. Signal Process. 9(4), 647–662 (2015)
Article Google Scholar
Hazan, E.: Introduction to online convex optimization. Found. Trends Optim. 2(3–4), 157–325 (2016)
Article Google Scholar
Hazan, E., Seshadhri, C.: Adaptive algorithms for online decision problems. In: Electronic Colloquium on Computational Complexity (ECCC) (2007)
Google Scholar
Laskey, M., Chuck, C., Lee, J., Mahler, J., Krishnan, S., Jamieson, K., Dragan, A., Goldberg, K.: Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. In: IEEE International Conference on Robotics and Automation (ICRA) 2017 (2017)
Google Scholar
Mokhtari, A., Shahrampour, S., Jadbabaie, A., Ribeiro, A.: Online optimization in dynamic environments: improved regret rates for strongly convex problems. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 7195–7201. IEEE (2016)
Google Scholar
Osa, T., Pajarinen, J., Neumann, G., Andrew Bagnell, J., Abbeel, P., Peters, J., et al.: An algorithmic perspective on imitation learning. Found. Trends Robot. 7(1–2), 1–179 (2018)
Google Scholar
Rakhlin, S., Sridharan, K.: Optimization, learning, and games with predictable sequences. In: Advances in Neural Information Processing Systems, pp. 3066–3074 (2013)
Google Scholar
Ross, S., Gordon, G.J., Andrew Bagnell, J.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics (2011)
Google Scholar
Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., Andrew Bagnell, J.: Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International Conference on Machine Learning (2017)
Google Scholar
Yang, T., Zhang, L., Jin, R., Yi, J.: Tracking slowly moving clairvoyant: optimal dynamic regret of online learning with true and noisy gradient. In: International Conference on Machine Learning (2016)
Google Scholar
Zhang, L., Yang, T., Yi, J., Rong, J., Zhou, Z.-H.: Improved dynamic regret for non-degenerate functions. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 928–936 (2003)
Google Scholar

Download references

Acknowledgments

This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative and by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, NSF National Robotics Initiative Award 1734633. The authors were supported in part by donations from Siemens, Google, Amazon Robotics, Toyota Research Institute, Autodesk, ABB, Samsung, Knapp, Loccioni, Honda, Intel, Comcast, Cisco, and Hewlett-Packard. We thank the WAFR community for their valuable comments and our colleagues, in particular Jeffrey Mahler, Ching-An Cheng, and Brijen Thananjeyan for their insights.

Author information

Authors and Affiliations

University of California, Berkeley, USA
Jonathan N. Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani & Ken Goldberg

Authors

Jonathan N. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Michael Laskey
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Kumar Tanwani
View author publications
You can also search for this author in PubMed Google Scholar
Anil Aswani
View author publications
You can also search for this author in PubMed Google Scholar
Ken Goldberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan N. Lee .

Editor information

Editors and Affiliations

Departamento de Sistemas Digitales, Instituto Tecnológico Autónomo de México, México, Mexico
Marco Morales
Department of Computer Science, University of New Mexico, Albuquerque, NM, USA
Lydia Tapia
Universidad Politécnica de Yucatán, Yucatán, Mexico
Gildardo Sánchez-Ante
Georgia Tech, Atlanta, GA, USA
Seth Hutchinson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, J.N., Laskey, M., Tanwani, A.K., Aswani, A., Goldberg, K. (2020). A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning. In: Morales, M., Tapia, L., Sánchez-Ante, G., Hutchinson, S. (eds) Algorithmic Foundations of Robotics XIII. WAFR 2018. Springer Proceedings in Advanced Robotics, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-44051-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-44051-0_13
Published: 08 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44050-3
Online ISBN: 978-3-030-44051-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics