Skip to main content

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

  • Conference paper
  • First Online:
Book cover Algorithmic Foundations of Robotics XIII (WAFR 2018)

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 14))

Included in the following conference series:

  • 954 Accesses

Abstract

On-policy imitation learning algorithms such as Dagger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for Dagger. Cheng and Boots (2018) consider the more realistic model for robotics where the underlying trajectory distribution, which is a function of the policy, is dynamic and show that it is possible to prove convergence when a condition on the rate of change of the trajectory distributions is satisfied. In this paper, we reframe that result using dynamic regret theory from the field of Online Optimization to prove convergence to locally optimal policies for Dagger, Imitation Gradient, and Multiple Imitation Gradient. These results inspire a new algorithm, Adaptive On-Policy Regularization (AOR), that ensures the conditions for convergence. We present simulation results with cart-pole balancing and walker locomotion benchmarks that suggest AOR can significantly decrease dynamic regret and chattering. To our knowledge, this the first application of dynamic regret theory to imitation learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrew Bagnell, J.: An invitation to imitation. Technical report, Carnegie Mellon University, Robotics Institute, Pittsburgh (2015)

    Google Scholar 

  2. Cheng, C.-A., Boots, B.: Convergence of value aggregation for imitation learning. In: International Conference on Artificial Intelligence and Statistics (2018)

    Google Scholar 

  3. Cheng, C.-A., Yan, X., Theodorou, E., Boots, B.: Accelerating imitation learning with predictive models. arXiv preprint arXiv:1806.04642 (2018)

  4. Cheng, C.-A., Yan, X., Wagener, N., Boots, B.: Fast policy learning through imitation and reinforcement. In: Conference on Uncertainty in Artificial Intelligence (2018)

    Google Scholar 

  5. Hall, E.C., Willett, R.M.: Online convex optimization in dynamic environments. IEEE J. Sel. Top. Signal Process. 9(4), 647–662 (2015)

    Article  Google Scholar 

  6. Hazan, E.: Introduction to online convex optimization. Found. Trends Optim. 2(3–4), 157–325 (2016)

    Article  Google Scholar 

  7. Hazan, E., Seshadhri, C.: Adaptive algorithms for online decision problems. In: Electronic Colloquium on Computational Complexity (ECCC) (2007)

    Google Scholar 

  8. Laskey, M., Chuck, C., Lee, J., Mahler, J., Krishnan, S., Jamieson, K., Dragan, A., Goldberg, K.: Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. In: IEEE International Conference on Robotics and Automation (ICRA) 2017 (2017)

    Google Scholar 

  9. Mokhtari, A., Shahrampour, S., Jadbabaie, A., Ribeiro, A.: Online optimization in dynamic environments: improved regret rates for strongly convex problems. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 7195–7201. IEEE (2016)

    Google Scholar 

  10. Osa, T., Pajarinen, J., Neumann, G., Andrew Bagnell, J., Abbeel, P., Peters, J., et al.: An algorithmic perspective on imitation learning. Found. Trends Robot. 7(1–2), 1–179 (2018)

    Google Scholar 

  11. Rakhlin, S., Sridharan, K.: Optimization, learning, and games with predictable sequences. In: Advances in Neural Information Processing Systems, pp. 3066–3074 (2013)

    Google Scholar 

  12. Ross, S., Gordon, G.J., Andrew Bagnell, J.: A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics (2011)

    Google Scholar 

  13. Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., Andrew Bagnell, J.: Deeply aggrevated: differentiable imitation learning for sequential prediction. In: International Conference on Machine Learning (2017)

    Google Scholar 

  14. Yang, T., Zhang, L., Jin, R., Yi, J.: Tracking slowly moving clairvoyant: optimal dynamic regret of online learning with true and noisy gradient. In: International Conference on Machine Learning (2016)

    Google Scholar 

  15. Zhang, L., Yang, T., Yi, J., Rong, J., Zhou, Z.-H.: Improved dynamic regret for non-degenerate functions. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  16. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 928–936 (2003)

    Google Scholar 

Download references

Acknowledgments

This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative and by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, NSF National Robotics Initiative Award 1734633. The authors were supported in part by donations from Siemens, Google, Amazon Robotics, Toyota Research Institute, Autodesk, ABB, Samsung, Knapp, Loccioni, Honda, Intel, Comcast, Cisco, and Hewlett-Packard. We thank the WAFR community for their valuable comments and our colleagues, in particular Jeffrey Mahler, Ching-An Cheng, and Brijen Thananjeyan for their insights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan N. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, J.N., Laskey, M., Tanwani, A.K., Aswani, A., Goldberg, K. (2020). A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning. In: Morales, M., Tapia, L., Sánchez-Ante, G., Hutchinson, S. (eds) Algorithmic Foundations of Robotics XIII. WAFR 2018. Springer Proceedings in Advanced Robotics, vol 14. Springer, Cham. https://doi.org/10.1007/978-3-030-44051-0_13

Download citation

Publish with us

Policies and ethics