research-article

Offline Imitation Learning Using Reward-free Exploratory Data

Authors:
Hao Wang

College of Computer, National University of Defense Technology, China

College of Computer, National University of Defense Technology, China

0000-0002-2312-4247
View Profile

,
Dawei Feng

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, China

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, China

0000-0002-7587-8905
View Profile

,
Bo Ding

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, China

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, China

0000-0002-1236-8318
View Profile

,
Wei Li

Independent Researcher, China

Independent Researcher, China

0000-0001-6457-0625
View Profile

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial IntelligenceDecember 2022Article No.: 94Pages 1–9https://doi.org/10.1145/3579654.3579753

Published:14 March 2023Publication History

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

Pages 1–9

ABSTRACT

Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward for learning or very expensive and time-consuming for agents to collect data interactively with the environment. However, the data used in previous OIL methods are all gathered by reinforcement learning algorithms guided by task-specific rewards, which is not a true reward-free premise and still suffers from the problem of designing an effective reward function in real tasks. To this end, we propose the reward-free exploratory data driven offline imitation learning (ExDOIL) framework. ExDOIL first trains an unsupervised reinforcement learning agent by interacting with the environment, and collects enough unsupervised exploration data during training; Then, a task independent yet simple and efficient reward function is used to relabel the collected data; Finally, an agent is trained to imitate the expert to complete the task through a conventional RL algorithm such as TD3. Extensive experiments on continuous control tasks demonstrate that the proposed framework can achieve better imitation performance(28% higher episode returns on average) comparing with previous SOTA method(ORIL) without any task-specific rewards.

References

Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.Google ScholarDigital Library
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.Google Scholar
Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning.. In Machine Intelligence 15. 103–129.Google Scholar
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894(2018).Google Scholar
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems 33 (2020), 9912–9924.Google Scholar
Kamil Ciosek. 2021. Imitation learning by reinforcement learning. arXiv preprint arXiv:2108.04763(2021).Google Scholar
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070(2018).Google Scholar
Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. 2020. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219(2020).Google Scholar
Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems 34 (2021), 20132–20145.Google Scholar
Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.Google Scholar
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.Google Scholar
Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S Merel, Daniel J Mankowitz, Cosmin Paduraru, 2020. Rl unplugged: A suite of benchmarks for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 7248–7259.Google Scholar
Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems 29 (2016).Google Scholar
Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. 2017. On convergence and stability of gans. arXiv preprint arXiv:1705.07215(2017).Google Scholar
Ilya Kostrikov, Ofir Nachum, and Jonathan Tompson. 2019. Imitation learning via off-policy distribution matching. arXiv preprint arXiv:1912.05032(2019).Google Scholar
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.Google Scholar
Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. 2020. Reinforcement learning with augmented data. Advances in neural information processing systems 33 (2020), 19884–19895.Google Scholar
Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, and Pieter Abbeel. 2021. URLB: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191(2021).Google Scholar
Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274(2019).Google Scholar
Hao Liu and Pieter Abbeel. 2021. Aps: Active pretraining with successor features. In International Conference on Machine Learning. PMLR, 6736–6747.Google Scholar
Hao Liu and Pieter Abbeel. 2021. Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems 34 (2021), 18459–18473.Google Scholar
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, 2778–2787.Google ScholarCross Ref
Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta. 2019. Self-supervised exploration via disagreement. In International conference on machine learning. PMLR, 5062–5071.Google Scholar
Siddharth Reddy, Anca D Dragan, and Sergey Levine. 2019. Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108(2019).Google Scholar
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, 2018. Deepmind control suite. arXiv preprint arXiv:1801.00690(2018).Google Scholar
R. Wang. 2006. Reinforcement Learning: An Introduction. In 2006 International Conference on Artificial Intelligence: 50 Years’ Achievements, Future Directions and Social Impacts.Google Scholar
Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh S Merel, Jost Tobias Springenberg, Scott E Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, 2020. Critic regularized regression. Advances in Neural Information Processing Systems 33 (2020), 7768–7778.Google Scholar
Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto. 2022. Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning. arXiv preprint arXiv:2201.13425(2022).Google Scholar
Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. 2021. Reinforcement learning with prototypical representations. In International Conference on Machine Learning. PMLR, 11920–11931.Google Scholar
Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, and Scott Reed. 2020. Offline learning from demonstrations and unlabeled experience. arXiv preprint arXiv:2011.13885(2020).Google Scholar

Index Terms

Offline Imitation Learning Using Reward-free Exploratory Data
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling

Recommendations

Imitation Learning: A Survey of Learning Methods

Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been ...
Read More
Disentangled Representation Learning for Generative Adversarial Multi-task Imitation Learning
CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System

Multi-task imitation learning (MTIL) is an effective approach to training an autonomous agent that is capable of performing multiple tasks using multi-task expert demonstrations. Since different tasks often share similarities, learning them ...
Read More
Improve generated adversarial imitation learning with reward variance regularization
Abstract
Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence
December 2022
770 pages
ISBN:9781450398336
DOI:10.1145/3579654

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataset
imitation learning
offline
relabel reward
unsupervised reinforcement learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate173of395submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Offline Imitation Learning Using Reward-free Exploratory Data

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Imitation Learning: A Survey of Learning Methods

Disentangled Representation Learning for Generative Adversarial Multi-task Imitation Learning

Improve generated adversarial imitation learning with reward variance regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Offline Imitation Learning Using Reward-free Exploratory Data

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Imitation Learning: A Survey of Learning Methods

Disentangled Representation Learning for Generative Adversarial Multi-task Imitation Learning

Improve generated adversarial imitation learning with reward variance regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media