Transferring policy of deep reinforcement learning from simulation to reality for robotics

Ju, Hao; Juan, Rongshun; Gomez, Randy; Nakamura, Keisuke; Li, Guangliang

doi:10.1038/s42256-022-00573-6

Review Article
Published: 14 December 2022

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Hao Ju¹^na1,
Rongshun Juan¹^na1,
Randy Gomez²,
Keisuke Nakamura² &
…
Guangliang Li ORCID: orcid.org/0000-0003-1728-5711¹

Nature Machine Intelligence volume 4, pages 1077–1087 (2022)Cite this article

2829 Accesses
14 Citations
Metrics details

Subjects

Abstract

Deep reinforcement learning has achieved great success in many fields and has shown promise in learning robust skills for robot control in recent years. However, sampling efficiency and safety problems still limit its application to robot control in the real world. One common solution is to train the robot control policy in a simulation environment and transfer it to the real world. However, policies trained in simulations usually have unsatisfactory performance in the real world because simulators inevitably model reality imperfectly. Inspired by biological transfer learning processes in the brains of humans and other animals, sim-to-real transfer reinforcement learning has been proposed and has become a focus of researchers applying reinforcement learning to robotics. Here, we describe state-of-the-art sim-to-real transfer reinforcement learning methods, which are inspired by insights into transfer learning in nature, such as extracting features in common between tasks, enriching training experience, multitask learning, continual learning and fast learning. Our objective is to present a comprehensive survey of the most recent advances in sim-to-real transfer reinforcement learning. We hope it can facilitate the application of deep reinforcement learning to solve complex robot control problems in the real world.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Extracting common representations between source and target domains via domain adaptation for sim-to-real policy transfer trained through deep reinforcement learning.**

**Fig. 3: An example of an inverse dynamics model method for policy training with GAT²⁴.**

**Fig. 4: Diagram of policy training with the meta-learning method MAML.**

Identifying important sensory feedback for learning locomotion skills

Article Open access 21 August 2023

Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN

Article Open access 07 September 2023

First return, then explore

Article 24 February 2021

References

Sutton, R. & Barto, A. Reinforcement Learning: an Introduction (MIT Press, 2018).
Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013).
Article Google Scholar
Dayan, P. & Niv, Y. Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008).
Article Google Scholar
Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature 521, 445–451 (2015).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Angelov, P. & Soares, E. Towards explainable deep neural networks (xDNN). Neural Netw. 130, 185–194 (2020).
Article Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article Google Scholar
Schölkopf, B. Learning to see and act. Nature 518, 486–487 (2015).
Article Google Scholar
Google DeepMind. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii (2019).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article Google Scholar
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at https://arxiv.org/abs/1912.06680 (2019).
Heess, N. et al. Emergence of locomotion behaviours in rich environments. Preprint at https://arxiv.org/abs/1707.02286 (2017).
Florensa, C., Duan, Y. & Abbeel, P. Stochastic neural networks for hierarchical reinforcement learning. In Proc. International Conference on Learning Representations (ICLR) 1–10 (OpenReview.net, 2017).
Rajeswaran, A. et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proc. Robotics: Science and Systems (RSS) 1–9 (RSS foundation, 2018).
Andrychowicz, M. et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39, 3–20 (2020).
Article Google Scholar
Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-real transfer of robotic control with dynamics randomization. In Proc. IEEE International Conference on Robotics and Automation (ICRA) 3803–3810 (IEEE, 2018).
Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems (RSS) 1–11 (RSS foundation, 2018).
Wang, J. & Jiang, J. Learning across tasks for zero-shot domain adaptation from a single source domain. IEEE Trans. Pattern. Anal. Mach. Intell. 44, 6264–6279 (2021).
Daumé, H. III. Frustratingly easy domain adaptation. In Proc. 45th Annual Meeting of the Association of Computational Linguistics 256–263(2007).
Ben-David, S. et al. Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 19, 137–144 (2007).
Tremblay, J. et al. Training deep networks with synthetic data: bridging the reality gap by domain randomization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshop 969–977 (IEEE, 2018).
Tobin, J. et al. Domain randomization and generative models for robotic grasping. In Proc. International Conference on Intelligent Robots and Systems (IROS) 3482–3489 (IEEE, 2018).
Christiano, P. et al. Transfer from simulation to real world through learning deep inverse dynamics model. Preprint at https://arxiv.org/abs/1610.03518 (2016).
Hanna, J. P., Desai, S., Karnan, H., Warnell, G. & Stone, P. Grounded action transformation for sim-to-real reinforcement learning. Mach. Learn. 110, 2469–2499 (2021).
Article MathSciNet MATH Google Scholar
Rusu, A. A. et al. Progressive neural networks. Preprint at https://arxiv.org/abs/1606.04671 (2016).
Zhang, Z. et al. Progressive neural networks for image classification. Preprint at https://arxiv.org/abs/1804.09803 (2018).
Mishra, N., Rohaninejad, M., Chen, X. & Abbeel, P. A simple neural attentive meta-learner. In Proc. International Conference on Learning Representations (ICLR) 1-17 (OpenReview.net, 2018).
Xu, Z., van Hasselt, H. & Silver, D. Meta-gradient reinforcement learning. Adv. Neural Inf. Process. Syst. 31, 2402–2413 (Neural Information Processing Systems Foundation, 2018).
Clavera, I. et al. Model-based reinforcement learning via meta-policy optimization. In Proc. 2nd Annual Conference on Robot Learning 87, 617–629 (PMLR, 2018).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. International Conference on Machine Learning (ICML) (eds Precup, D. & Teh, Y. W.) 1126–1135 (JMLR.org, 2017).
Zhao, W., Queralta, J. P. & Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In IEEE Symposium Series on Computational Intelligence (SSCI) 737–744 (IEEE, 2020).
Taylor, M. E. & Stone, P. H. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).
MathSciNet MATH Google Scholar
Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996).
Article Google Scholar
Wu, J., Huang, Z. & Lv, C. Uncertainty-aware model-based reinforcement learning: methodology and application in autonomous driving. In IEEE Trans. Intell. Veh. https://doi.org/10.1109/TIV.2022.3185159 (2022).
Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
Article MATH Google Scholar
Li, S. et al. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 4213–4220 (AAAI Press, 2019).
Tobin, J. et al. Domain randomization for transferring deep neural networks from simulation to the real world. In Proc. International Conference on Intelligent Robots and Systems (IROS) 23–30 (IEEE, 2017).
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. & Darrell, T. Deep domain confusion: maximizing for domain invariance. Preprint at https://arxiv.org/abs/1412.3474 (2014).
Long, M., Cao, Y., Wang, J. & Jordan, M. Learning transferable features with deep adaptation networks. In Proc. International Conference on Machine Learning (ICML) 37, 97–105 (JMLR.org, 2015).
Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proc. AAAI Conference on Artificial Intelligence Vol. 30, 2058–2065 (AAAI Press, 2016).
Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 2096–2030 (2016).
MathSciNet Google Scholar
Tzeng, E., Hoffman, J., Darrell, T. & Saenko, K. Simultaneous deep transfer across domains and tasks. In Proc. International Conference on Computer Vision (ICCV) 4068–4076 (IEEE, 2015).
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D. & Krishnan, D. Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3722–3731 (IEEE, 2017).
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D. & Erhan, D. Domain separation networks. Adv. Neural Inf. Process. Syst. 29, 343–351 (2016).
Google Scholar
Carr, T., Chli, M. & Vogiatzis, G. Domain adaptation for reinforcement learning on the Atari. In Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems 1859–1861 (International Foundation for Autonomous Agents and Multiagent Systems, 2019).
Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).
Article Google Scholar
Tzeng, E. et al. Adapting deep visuomotor representations with weak pairwise constraints. In Algorithmic Foundations of Robotics XII (eds Goldberg, K. et al.) 688–703 (Springer, 2020).
Wise, M., Ferguson, M., King, D., Diehr, E. & Dymesich, D. Fetch and Freight: standard platforms for service robot applications. In Workshop on Autonomous Mobile Service Robots 1-6 (2016).
Xu, Y. & Vatankhah, H. SimSpark: an open source robot simulator developed by the RoboCup community. In RoboCup 2013: Robot World Cup XVII (eds S. Behnke et al.) 632–639 (Springer, 2013).
Koenig, N. & Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proc. International Conference on Intelligent Robots and Systems (IROS) Vol. 3, 2149–2154 (IEEE, 2004).
Desai, S. et al. An imitation from observation approach to transfer learning with dynamics mismatch. Adv. Neural Inf. Process. Syst. 33, 3917–3929 (2020).
Google Scholar
Karnan, H., Desai, S., Hanna, J. P., Warnell, G. & Stone, P. Reinforced grounded action transformation for sim-to-real transfer. In Proc. International Conference on Intelligent Robots and Systems (IROS) 4397–4402 (IEEE, 2020).
Rusu, A. A. et al. Sim-to-real robot learning from pixels with progressive nets. In Proc. 1st Annual Conference on Robot Learning 78, 262–270 (PMLR, 2017).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
Arndt, K., Hazara, M., Ghadirzadeh, A. & Kyrki, V. Meta reinforcement learning for sim-to-real domain adaptation. In Proc. IEEE International Conference on Robotics and Automation 2725–2731 (IEEE, 2020).
Nagabandi, A. et al. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In Proc. International Conference on Learning Representations 1-17 (OpenReview.net, 2019).
Chebotar, Y. et al. Closing the sim-to-real loop: adapting simulation randomization with real world experience. In Proc. International Conference on Robotics and Automation (ICRA) 8973–8979 (IEEE, 2019).
Mehta, B., Diaz, M., Golemo, F., Pal, C. J. & Paull, L. Active domain randomization. In Proc. 4th Annual Conference on Robot Learning 100, 1162–1176 (PMLR, 2020).
Muratore, F., Gienger, M. & Peters, J. Assessing transferability from simulation to reality for reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1172–1183 (2021).
Article Google Scholar
Rusu, A. A. et al. Policy distillation. In Proc. International Conference on Learning Representations (ICLR) 1-13 (OpenReview.net, 2016).
Traoré, R. et al. DisCoRL: continual reinforcement learning via policy distillation. In NeurIPS Workshop on Deep Reinforcement Learning 1-15 (2019).
James, S. et al. Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 12627–12637 (IEEE, 2019).
Kalashnikov, D. et al. QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. In Proc. 2nd Annual Conference on Robot Learning Vol. 87, 651–673 (PMLR, 2018).
Ljung, L. System identification. In Signal Analysis and Prediction (eds A. Procházka et al.) 163–173 (Springer, 1998).
Åström, K. J. & Eykhoff, P. System identification—a survey. Automatica 7, 123–162 (1971).
Article MathSciNet MATH Google Scholar
Lowrey, K., Kolev, S., Dao, J., Rajeswaran, A. & Todorov, E. Reinforcement learning for non-prehensile manipulation: transfer from simulation to physical system. In Proc. IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR) 35–42 (IEEE, 2018).
Antonova, R., Cruciani, S., Smith, C. & Kragic, D. Reinforcement learning for pivoting task. Preprint at https://arxiv.org/abs/1703.00472 (2017).
Shah, S., Dey, D., Lovett, C. & Kapoor, A. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics (eds M. Hutter & R. Siegwart) 621–635 (Springer Proceedings in Advanced Robotics Vol. 5, Springer, 2018).
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A. & Koltun, V. CARLA: an open urban driving simulator. In Proc. 1st Annual Conference on Robot Learning 78, 1–16 (2017).
Kottas, G. S., Clarke, L. I., Horinek, D. & Michl, J. Artificial molecular rotors. Chem. Rev. 105, 1281–1376 (2005).
Article Google Scholar
McCord, C., Queralta, J. P., Gia, T. N. & Westerlund, T. Distributed progressive formation control for multi-agent systems: 2D and 3D deployment of UAVs in ROS/Gazebo with RotorS. In Proc. European Conference on Mobile Robots (ECMR) 1–6 (IEEE, 2019).
Coumans, E. & Bai, Y. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning https://pybullet.org/wordpress/ (2016).
Todorov, E., Erez, T. & Tassa, Y. MuJoCo: a physics engine for model-based control. In Proc. International Conference on Intelligent Robots and Systems (IROS) 5026–5033 (IEEE, 2012).
Morimoto, J. & Doya, K. Robust reinforcement learning. Neural Comput. 17, 335–359 (2005).
Article MathSciNet Google Scholar
Tessler, C., Efroni, Y. & Mannor, S. Action robust reinforcement learning and applications in continuous control. In Proc. International Conference on Machine Learning (ICML) 97, 6215–6224 (JMLR.org, 2019).
Mankowitz, D. J. et al. Robust reinforcement learning for continuous control with model misspecification. In Proc. International Conference on Learning Representations (ICLR)1-11 (OpenReview.net, 2020).
Garcıa, J. & Fernández, F. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015).
MathSciNet MATH Google Scholar
Saunders, W., Sastry, G., Stuhlmüller, A. & Evans, O. Trial without error: towards safe reinforcement learning via human intervention. In Proc. 17th International Conference on Autonomous Agents and MultiAgent Systems 2067–2069 (International Foundation for Autonomous Agents and Multiagent Systems, 2018).
Xie, T., Jiang, N., Wang, H., Xiong, C. & Bai, Y. Policy finetuning: bridging sample-efficient offline and online reinforcement learning. Adv. Neural Inf. Process. Syst. 34, 27395–27407 (2021).
Google Scholar
Lee, S., Seo, Y., Lee, K., Abbeel, P. & Shin, J. Offline-to-online reinforcement learning via balanced replay and pessimistic Q-ensemble. In Proc. 6th Annual Conference on Robot Learning 164, 1702–1712 (2022).
Christiano, P. F. et al. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst. 30, 4302–4310 (2017).
Li, G., Whiteson, S., Knox, W. B. & Hung, H. Social interaction for efficient agent learning from human reward. Auton. Agent Multi Agent Syst. 32, 1–25 (2018).
Article Google Scholar
Li, G., He, B., Gomez, R. & Nakamura, K. Interactive reinforcement learning from demonstration and human evaluative feedback. In Proc. 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 1156–1162 (IEEE, 2018).
Arora, S. & Doshi, P. A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021).
Article MathSciNet MATH Google Scholar
Juan, R. et al. Shaping progressive net of reinforcement learning for policy transfer with human evaluative feedback. In Proc. IEEE International Conference on Intelligent Robots and Systems (IROS) 1281–1288 (IEEE, 2021).
Li, G., Gomez, R., Nakamura, K. & He, B. Human-centered reinforcement learning: a survey. IEEE Trans. Hum. Mach. Syst. 49, 337–349 (2019).
Article Google Scholar
Neftci, E. O. & Averbeck, B. B. Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1, 133–143 (2019).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant 51809246) and by the Honda Research Institute Japan Co., Ltd. We especially thank L. Wang for taking the time to provide feedback.

Author information

These authors contributed equally: Hao Ju, Rongshun Juan.

Authors and Affiliations

Ocean University of China, Qingdao, China
Hao Ju, Rongshun Juan & Guangliang Li
Honda Research Institute Japan Co., Ltd., Wako, Japan
Randy Gomez & Keisuke Nakamura

Authors

Hao Ju
View author publications
You can also search for this author in PubMed Google Scholar
Rongshun Juan
View author publications
You can also search for this author in PubMed Google Scholar
Randy Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Guangliang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangliang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ju, H., Juan, R., Gomez, R. et al. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nat Mach Intell 4, 1077–1087 (2022). https://doi.org/10.1038/s42256-022-00573-6

Download citation

Received: 05 December 2021
Accepted: 21 October 2022
Published: 14 December 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s42256-022-00573-6

This article is cited by

Knowledge transfer enabled reinforcement learning for efficient and safe autonomous ship collision avoidance
- Chengbo Wang
- Ning Wang
- Mingxing Fang
International Journal of Machine Learning and Cybernetics (2024)
UAV control in autonomous object-goal navigation: a systematic literature review
- Angel Ayala
- Leticia Portela
- Francisco Cruz
Artificial Intelligence Review (2024)
Sim-to-real transfer of co-optimized soft robot crawlers
- Charles Schaff
- Audrey Sedal
- Matthew R. Walter
Autonomous Robots (2023)