skip to main content
10.1145/3579654.3579753acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Offline Imitation Learning Using Reward-free Exploratory Data

Authors Info & Claims
Published:14 March 2023Publication History

ABSTRACT

Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward for learning or very expensive and time-consuming for agents to collect data interactively with the environment. However, the data used in previous OIL methods are all gathered by reinforcement learning algorithms guided by task-specific rewards, which is not a true reward-free premise and still suffers from the problem of designing an effective reward function in real tasks. To this end, we propose the reward-free exploratory data driven offline imitation learning (ExDOIL) framework. ExDOIL first trains an unsupervised reinforcement learning agent by interacting with the environment, and collects enough unsupervised exploration data during training; Then, a task independent yet simple and efficient reward function is used to relabel the collected data; Finally, an agent is trained to imitate the expert to complete the task through a conventional RL algorithm such as TD3. Extensive experiments on continuous control tasks demonstrate that the proposed framework can achieve better imitation performance(28% higher episode returns on average) comparing with previous SOTA method(ORIL) without any task-specific rewards.

References

  1. Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.Google ScholarGoogle Scholar
  3. Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning.. In Machine Intelligence 15. 103–129.Google ScholarGoogle Scholar
  4. Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894(2018).Google ScholarGoogle Scholar
  5. Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems 33 (2020), 9912–9924.Google ScholarGoogle Scholar
  6. Kamil Ciosek. 2021. Imitation learning by reinforcement learning. arXiv preprint arXiv:2108.04763(2021).Google ScholarGoogle Scholar
  7. Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070(2018).Google ScholarGoogle Scholar
  8. Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. 2020. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219(2020).Google ScholarGoogle Scholar
  9. Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems 34 (2021), 20132–20145.Google ScholarGoogle Scholar
  10. Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.Google ScholarGoogle Scholar
  11. Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.Google ScholarGoogle Scholar
  12. Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S Merel, Daniel J Mankowitz, Cosmin Paduraru, 2020. Rl unplugged: A suite of benchmarks for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 7248–7259.Google ScholarGoogle Scholar
  13. Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems 29 (2016).Google ScholarGoogle Scholar
  14. Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. 2017. On convergence and stability of gans. arXiv preprint arXiv:1705.07215(2017).Google ScholarGoogle Scholar
  15. Ilya Kostrikov, Ofir Nachum, and Jonathan Tompson. 2019. Imitation learning via off-policy distribution matching. arXiv preprint arXiv:1912.05032(2019).Google ScholarGoogle Scholar
  16. Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.Google ScholarGoogle Scholar
  17. Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. 2020. Reinforcement learning with augmented data. Advances in neural information processing systems 33 (2020), 19884–19895.Google ScholarGoogle Scholar
  18. Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, and Pieter Abbeel. 2021. URLB: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191(2021).Google ScholarGoogle Scholar
  19. Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274(2019).Google ScholarGoogle Scholar
  20. Hao Liu and Pieter Abbeel. 2021. Aps: Active pretraining with successor features. In International Conference on Machine Learning. PMLR, 6736–6747.Google ScholarGoogle Scholar
  21. Hao Liu and Pieter Abbeel. 2021. Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems 34 (2021), 18459–18473.Google ScholarGoogle Scholar
  22. Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, 2778–2787.Google ScholarGoogle ScholarCross RefCross Ref
  23. Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta. 2019. Self-supervised exploration via disagreement. In International conference on machine learning. PMLR, 5062–5071.Google ScholarGoogle Scholar
  24. Siddharth Reddy, Anca D Dragan, and Sergey Levine. 2019. Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108(2019).Google ScholarGoogle Scholar
  25. Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, 2018. Deepmind control suite. arXiv preprint arXiv:1801.00690(2018).Google ScholarGoogle Scholar
  26. R. Wang. 2006. Reinforcement Learning: An Introduction. In 2006 International Conference on Artificial Intelligence: 50 Years’ Achievements, Future Directions and Social Impacts.Google ScholarGoogle Scholar
  27. Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh S Merel, Jost Tobias Springenberg, Scott E Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, 2020. Critic regularized regression. Advances in Neural Information Processing Systems 33 (2020), 7768–7778.Google ScholarGoogle Scholar
  28. Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto. 2022. Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning. arXiv preprint arXiv:2201.13425(2022).Google ScholarGoogle Scholar
  29. Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. 2021. Reinforcement learning with prototypical representations. In International Conference on Machine Learning. PMLR, 11920–11931.Google ScholarGoogle Scholar
  30. Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, and Scott Reed. 2020. Offline learning from demonstrations and unlabeled experience. arXiv preprint arXiv:2011.13885(2020).Google ScholarGoogle Scholar

Index Terms

  1. Offline Imitation Learning Using Reward-free Exploratory Data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence
          December 2022
          770 pages
          ISBN:9781450398336
          DOI:10.1145/3579654

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 March 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate173of395submissions,44%
        • Article Metrics

          • Downloads (Last 12 months)60
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format