Imitation Learning with Sinkhorn Distances

Papagiannis, Georgios; Li, Yunpeng

doi:10.1007/978-3-031-26412-2_8

Georgios Papagiannis¹³ &
Yunpeng Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

987 Accesses

Abstract

Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation.

G. Papagiannis—Work done as a student at the University of Surrey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Imitation learning by state-only distribution matching

Article Open access 29 November 2023

Discriminative reward co-training

Article 11 December 2024

Multi-agent Imitation Learning with Copulas

References

Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S.: Apprenticeship learning for motion planning with application to parking lot navigation. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1083–1090 (2008)
Google Scholar
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Article Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 214–223. Sydney, Australia (2017)
Google Scholar
Blondé, L., Kalousis, A.: Sample-efficient imitation learning via generative adversarial nets. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). Okinawa, Japan (2019)
Google Scholar
Coates, A., Abbeel, P., Ng, A.Y.: Learning for control from multiple demonstrations. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 144–151. Helsinki, Finland (2008)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 2292–2300. Lake Tahoe, Nevada, USA (2013)
Google Scholar
Dadashi, R., Hussenot, L., Geist, M., Pietquin, O.: Primal Wasserstein imitation learning. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Fu, J., Luo, K., Levine, S.: Learning robust rewards with adverserial inverse reinforcement learning. In: Proceedings of the International Conference on Learning Representations (ICLR). Vancouver, Canada (2018)
Google Scholar
Ghasemipour, S.K.S., Zemel, R., Gu, S.: A divergence minimization perspective on imitation learning methods. In: Proceedings of the Conference on Robot Learning (CoRL). Osaka, Japan (2019)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Proc. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680. Montréal, Canada (2014)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 4565–4573. Barcelona, Spain (2016)
Google Scholar
Ke, L., Barnes, M., Sun, W., Lee, G., Choudhury, S., Srinivasa, S.S.: Imitation learning as f-divergence minimization. arXiv:1905.12888 (2019)
Kober, J., Peters, J.: Learning motor primitives for robotics. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2112–2118. Kobe, Japan (2009)
Google Scholar
Kostrikov, I., Agrawal, K.K., Levine, S., Tompson, J.: Discriminator-actor-critic: addressing sample inefficiency and reward bias in adversarial imitation learning. In: Proceedings of the International Conference on Learning Representations (ICLR). New Orleans, USA (2019)
Google Scholar
Kostrikov, I., Nachum, O., Tompson, J.: Imitation learning via off-policy distribution matching. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Kuderer, M., Kretzschmar, H., Burgard, W.: Teaching mobile robots to cooperatively navigate in populated environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3138–3143 (2013)
Google Scholar
Laskey, M., Lee, J., Fox, R., Dragan, A., Goldberg, K.: DART: noise injection for robust imitation learning. In: Proceedings of the Conference on Robot Learning (CoRL). Mountain View, USA (2017)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 663–670. Stanford, CA, USA (2000)
Google Scholar
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J.: An algorithmic perspective on imitation learning. Now Foundations and Trends (2018)
Google Scholar
Osa, T., Sugita, N., Mitsuishi, M.: Online trajectory planning in dynamic environments for surgical task automation. In: Proceedings of the Robotics: Science and Systems. Berkley, CA, USA (2014)
Google Scholar
Park, D., Noseworthy, M., Paul, R., Roy, S., Roy, N.: Inferring task goals and constraints using Bayesian nonparametric inverse reinforcement learning. In: Proceedings of the Conference on Robot Learning (CoRL). Osaka, Japan (2019)
Google Scholar
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 305–313 (1989)
Google Scholar
Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of the Robotics: Science and Systems (RSS). Pittsburgh, Pennsylvania (2018)
Google Scholar
Reddy, S., Dragan, A.D., Levine, S.: SQIL: imitation learning via reinforcement learning with sparse rewards. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Ross, S., Gordon, G., Bagnell, J.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Conference on Artificial Intelligence and Statistics, pp. 627–635. Fort Lauderdale, FL, USA (2011)
Google Scholar
Ross, S., Bagnell, A.: Efficient reductions for imitation learning. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 661–668, 13–15 May 2010. Chia Laguna Resort, Sardinia, Italy
Google Scholar
Salimans, T., et al.: Improved techniques for training GANs. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 2234–2242. Barcelona, Spain (2016)
Google Scholar
Salimans, T., Zhang, H., Radford, A., Metaxas, D.N.: Improving GANs using optimal transport. In: Proceedings of the International Conference on Learning Representation (ICLR). Vancouver, Canada (2018)
Google Scholar
Sammut, C., Hurst, S., Kedzier, D., Michie, D.: Learning to fly. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 385–393. Aberdeen, Scotland, United Kingdom (1992)
Google Scholar
Sasaki, F., Yohira, T., Kawaguchi, A.: Sample efficient imitation learning for continuous control. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1889–1897. Lille, France (2015)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–503 (2016)
Article Google Scholar
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Lanckriet, G.R.G., Schölkopf, B.: A note on integral probability metrics and $\phi $-divergences. arXiv:0901.2698 (2009)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Torabi, F., Geiger, S., Warnell, G., Stone, P.: Sample-efficient adversarial imitation learning from observation. arXiv:1906.07374 (2019)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer Science & Business Media, Berlin, Germany (2008). https://doi.org/10.1007/978-3-540-71050-9
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019)
Article Google Scholar
Xiao, H., Herman, M., Wagner, J., Ziesche, S., Etesami, J., Linh, T.H.: Wasserstein adversarial imitation learning. arXiv:1906.08113 (2019)
Zhu, Y., et al.: Reinforcement and imitation learning for diverse visuomotor skills. arXiv:1802.09564 (2018)
Ziebart, B., Bagnell, A.J.: Modeling interaction via the principle of maximum causal entropy. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1255–1262. Haifa, Israel (2010)
Google Scholar
Ziebart, B., Mass, A., Bagnell, A.J., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI Conference on Artificial intelligence, pp. 1433–1438 (2008)
Google Scholar
Zucker, M., et al.: Optimization and learning for rough terrain legged locomotion. Int. J. Robot. Res. 30(2), 175–191 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, UK
Georgios Papagiannis
University of Surrey, Guildford, UK
Yunpeng Li

Authors

Georgios Papagiannis
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Papagiannis .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papagiannis, G., Li, Y. (2023). Imitation Learning with Sinkhorn Distances. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_8
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Imitation Learning with Sinkhorn Distances