Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Transferring policy of deep reinforcement learning from simulation to reality for robotics

Abstract

Deep reinforcement learning has achieved great success in many fields and has shown promise in learning robust skills for robot control in recent years. However, sampling efficiency and safety problems still limit its application to robot control in the real world. One common solution is to train the robot control policy in a simulation environment and transfer it to the real world. However, policies trained in simulations usually have unsatisfactory performance in the real world because simulators inevitably model reality imperfectly. Inspired by biological transfer learning processes in the brains of humans and other animals, sim-to-real transfer reinforcement learning has been proposed and has become a focus of researchers applying reinforcement learning to robotics. Here, we describe state-of-the-art sim-to-real transfer reinforcement learning methods, which are inspired by insights into transfer learning in nature, such as extracting features in common between tasks, enriching training experience, multitask learning, continual learning and fast learning. Our objective is to present a comprehensive survey of the most recent advances in sim-to-real transfer reinforcement learning. We hope it can facilitate the application of deep reinforcement learning to solve complex robot control problems in the real world.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Extracting common representations between source and target domains via domain adaptation for sim-to-real policy transfer trained through deep reinforcement learning.
Fig. 2: Domain randomization.
Fig. 3: An example of an inverse dynamics model method for policy training with GAT24.
Fig. 4: Diagram of policy training with the meta-learning method MAML.

Similar content being viewed by others

References

  1. Sutton, R. & Barto, A. Reinforcement Learning: an Introduction (MIT Press, 2018).

  2. Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013).

    Article  Google Scholar 

  3. Dayan, P. & Niv, Y. Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008).

    Article  Google Scholar 

  4. Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature 521, 445–451 (2015).

    Article  Google Scholar 

  5. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  6. Angelov, P. & Soares, E. Towards explainable deep neural networks (xDNN). Neural Netw. 130, 185–194 (2020).

    Article  Google Scholar 

  7. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  Google Scholar 

  8. Schölkopf, B. Learning to see and act. Nature 518, 486–487 (2015).

    Article  Google Scholar 

  9. Google DeepMind. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii (2019).

  10. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  Google Scholar 

  11. Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at https://arxiv.org/abs/1912.06680 (2019).

  12. Heess, N. et al. Emergence of locomotion behaviours in rich environments. Preprint at https://arxiv.org/abs/1707.02286 (2017).

  13. Florensa, C., Duan, Y. & Abbeel, P. Stochastic neural networks for hierarchical reinforcement learning. In Proc. International Conference on Learning Representations (ICLR) 1–10 (OpenReview.net, 2017).

  14. Rajeswaran, A. et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proc. Robotics: Science and Systems (RSS) 1–9 (RSS foundation, 2018).

  15. Andrychowicz, M. et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39, 3–20 (2020).

    Article  Google Scholar 

  16. Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-real transfer of robotic control with dynamics randomization. In Proc. IEEE International Conference on Robotics and Automation (ICRA) 3803–3810 (IEEE, 2018).

  17. Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems (RSS) 1–11 (RSS foundation, 2018).

  18. Wang, J. & Jiang, J. Learning across tasks for zero-shot domain adaptation from a single source domain. IEEE Trans. Pattern. Anal. Mach. Intell. 44, 6264–6279 (2021).

  19. Daumé, H. III. Frustratingly easy domain adaptation. In Proc. 45th Annual Meeting of the Association of Computational Linguistics 256–263(2007).

  20. Ben-David, S. et al. Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 19, 137–144 (2007).

  21. Tremblay, J. et al. Training deep networks with synthetic data: bridging the reality gap by domain randomization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshop 969–977 (IEEE, 2018).

  22. Tobin, J. et al. Domain randomization and generative models for robotic grasping. In Proc. International Conference on Intelligent Robots and Systems (IROS) 3482–3489 (IEEE, 2018).

  23. Christiano, P. et al. Transfer from simulation to real world through learning deep inverse dynamics model. Preprint at https://arxiv.org/abs/1610.03518 (2016).

  24. Hanna, J. P., Desai, S., Karnan, H., Warnell, G. & Stone, P. Grounded action transformation for sim-to-real reinforcement learning. Mach. Learn. 110, 2469–2499 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  25. Rusu, A. A. et al. Progressive neural networks. Preprint at https://arxiv.org/abs/1606.04671 (2016).

  26. Zhang, Z. et al. Progressive neural networks for image classification. Preprint at https://arxiv.org/abs/1804.09803 (2018).

  27. Mishra, N., Rohaninejad, M., Chen, X. & Abbeel, P. A simple neural attentive meta-learner. In Proc. International Conference on Learning Representations (ICLR) 1-17 (OpenReview.net, 2018).

  28. Xu, Z., van Hasselt, H. & Silver, D. Meta-gradient reinforcement learning. Adv. Neural Inf. Process. Syst. 31, 2402–2413 (Neural Information Processing Systems Foundation, 2018).

  29. Clavera, I. et al. Model-based reinforcement learning via meta-policy optimization. In Proc. 2nd Annual Conference on Robot Learning 87, 617–629 (PMLR, 2018).

  30. Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. International Conference on Machine Learning (ICML) (eds Precup, D. & Teh, Y. W.) 1126–1135 (JMLR.org, 2017).

  31. Zhao, W., Queralta, J. P. & Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In IEEE Symposium Series on Computational Intelligence (SSCI) 737–744 (IEEE, 2020).

  32. Taylor, M. E. & Stone, P. H. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).

    MathSciNet  MATH  Google Scholar 

  33. Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996).

    Article  Google Scholar 

  34. Wu, J., Huang, Z. & Lv, C. Uncertainty-aware model-based reinforcement learning: methodology and application in autonomous driving. In IEEE Trans. Intell. Veh. https://doi.org/10.1109/TIV.2022.3185159 (2022).

  35. Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).

    Article  MATH  Google Scholar 

  36. Li, S. et al. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 4213–4220 (AAAI Press, 2019).

  37. Tobin, J. et al. Domain randomization for transferring deep neural networks from simulation to the real world. In Proc. International Conference on Intelligent Robots and Systems (IROS) 23–30 (IEEE, 2017).

  38. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K. & Darrell, T. Deep domain confusion: maximizing for domain invariance. Preprint at https://arxiv.org/abs/1412.3474 (2014).

  39. Long, M., Cao, Y., Wang, J. & Jordan, M. Learning transferable features with deep adaptation networks. In Proc. International Conference on Machine Learning (ICML) 37, 97–105 (JMLR.org, 2015).

  40. Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proc. AAAI Conference on Artificial Intelligence Vol. 30, 2058–2065 (AAAI Press, 2016).

  41. Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 2096–2030 (2016).

    MathSciNet  Google Scholar 

  42. Tzeng, E., Hoffman, J., Darrell, T. & Saenko, K. Simultaneous deep transfer across domains and tasks. In Proc. International Conference on Computer Vision (ICCV) 4068–4076 (IEEE, 2015).

  43. Bousmalis, K., Silberman, N., Dohan, D., Erhan, D. & Krishnan, D. Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3722–3731 (IEEE, 2017).

  44. Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D. & Erhan, D. Domain separation networks. Adv. Neural Inf. Process. Syst. 29, 343–351 (2016).

    Google Scholar 

  45. Carr, T., Chli, M. & Vogiatzis, G. Domain adaptation for reinforcement learning on the Atari. In Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems 1859–1861 (International Foundation for Autonomous Agents and Multiagent Systems, 2019).

  46. Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).

    Article  Google Scholar 

  47. Tzeng, E. et al. Adapting deep visuomotor representations with weak pairwise constraints. In Algorithmic Foundations of Robotics XII (eds Goldberg, K. et al.) 688–703 (Springer, 2020).

  48. Wise, M., Ferguson, M., King, D., Diehr, E. & Dymesich, D. Fetch and Freight: standard platforms for service robot applications. In Workshop on Autonomous Mobile Service Robots 1-6 (2016).

  49. Xu, Y. & Vatankhah, H. SimSpark: an open source robot simulator developed by the RoboCup community. In RoboCup 2013: Robot World Cup XVII (eds S. Behnke et al.) 632–639 (Springer, 2013).

  50. Koenig, N. & Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proc. International Conference on Intelligent Robots and Systems (IROS) Vol. 3, 2149–2154 (IEEE, 2004).

  51. Desai, S. et al. An imitation from observation approach to transfer learning with dynamics mismatch. Adv. Neural Inf. Process. Syst. 33, 3917–3929 (2020).

    Google Scholar 

  52. Karnan, H., Desai, S., Hanna, J. P., Warnell, G. & Stone, P. Reinforced grounded action transformation for sim-to-real transfer. In Proc. International Conference on Intelligent Robots and Systems (IROS) 4397–4402 (IEEE, 2020).

  53. Rusu, A. A. et al. Sim-to-real robot learning from pixels with progressive nets. In Proc. 1st Annual Conference on Robot Learning 78, 262–270 (PMLR, 2017).

  54. Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).

    Article  Google Scholar 

  55. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).

  56. Arndt, K., Hazara, M., Ghadirzadeh, A. & Kyrki, V. Meta reinforcement learning for sim-to-real domain adaptation. In Proc. IEEE International Conference on Robotics and Automation 2725–2731 (IEEE, 2020).

  57. Nagabandi, A. et al. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In Proc. International Conference on Learning Representations 1-17 (OpenReview.net, 2019).

  58. Chebotar, Y. et al. Closing the sim-to-real loop: adapting simulation randomization with real world experience. In Proc. International Conference on Robotics and Automation (ICRA) 8973–8979 (IEEE, 2019).

  59. Mehta, B., Diaz, M., Golemo, F., Pal, C. J. & Paull, L. Active domain randomization. In Proc. 4th Annual Conference on Robot Learning 100, 1162–1176 (PMLR, 2020).

  60. Muratore, F., Gienger, M. & Peters, J. Assessing transferability from simulation to reality for reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1172–1183 (2021).

    Article  Google Scholar 

  61. Rusu, A. A. et al. Policy distillation. In Proc. International Conference on Learning Representations (ICLR) 1-13 (OpenReview.net, 2016).

  62. Traoré, R. et al. DisCoRL: continual reinforcement learning via policy distillation. In NeurIPS Workshop on Deep Reinforcement Learning 1-15 (2019).

  63. James, S. et al. Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 12627–12637 (IEEE, 2019).

  64. Kalashnikov, D. et al. QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. In Proc. 2nd Annual Conference on Robot Learning Vol. 87, 651–673 (PMLR, 2018).

  65. Ljung, L. System identification. In Signal Analysis and Prediction (eds A. Procházka et al.) 163–173 (Springer, 1998).

  66. Åström, K. J. & Eykhoff, P. System identification—a survey. Automatica 7, 123–162 (1971).

    Article  MathSciNet  MATH  Google Scholar 

  67. Lowrey, K., Kolev, S., Dao, J., Rajeswaran, A. & Todorov, E. Reinforcement learning for non-prehensile manipulation: transfer from simulation to physical system. In Proc. IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR) 35–42 (IEEE, 2018).

  68. Antonova, R., Cruciani, S., Smith, C. & Kragic, D. Reinforcement learning for pivoting task. Preprint at https://arxiv.org/abs/1703.00472 (2017).

  69. Shah, S., Dey, D., Lovett, C. & Kapoor, A. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics (eds M. Hutter & R. Siegwart) 621–635 (Springer Proceedings in Advanced Robotics Vol. 5, Springer, 2018).

  70. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A. & Koltun, V. CARLA: an open urban driving simulator. In Proc. 1st Annual Conference on Robot Learning 78, 1–16 (2017).

  71. Kottas, G. S., Clarke, L. I., Horinek, D. & Michl, J. Artificial molecular rotors. Chem. Rev. 105, 1281–1376 (2005).

    Article  Google Scholar 

  72. McCord, C., Queralta, J. P., Gia, T. N. & Westerlund, T. Distributed progressive formation control for multi-agent systems: 2D and 3D deployment of UAVs in ROS/Gazebo with RotorS. In Proc. European Conference on Mobile Robots (ECMR) 1–6 (IEEE, 2019).

  73. Coumans, E. & Bai, Y. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning https://pybullet.org/wordpress/ (2016).

  74. Todorov, E., Erez, T. & Tassa, Y. MuJoCo: a physics engine for model-based control. In Proc. International Conference on Intelligent Robots and Systems (IROS) 5026–5033 (IEEE, 2012).

  75. Morimoto, J. & Doya, K. Robust reinforcement learning. Neural Comput. 17, 335–359 (2005).

    Article  MathSciNet  Google Scholar 

  76. Tessler, C., Efroni, Y. & Mannor, S. Action robust reinforcement learning and applications in continuous control. In Proc. International Conference on Machine Learning (ICML) 97, 6215–6224 (JMLR.org, 2019).

  77. Mankowitz, D. J. et al. Robust reinforcement learning for continuous control with model misspecification. In Proc. International Conference on Learning Representations (ICLR)1-11 (OpenReview.net, 2020).

  78. Garcıa, J. & Fernández, F. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015).

    MathSciNet  MATH  Google Scholar 

  79. Saunders, W., Sastry, G., Stuhlmüller, A. & Evans, O. Trial without error: towards safe reinforcement learning via human intervention. In Proc. 17th International Conference on Autonomous Agents and MultiAgent Systems 2067–2069 (International Foundation for Autonomous Agents and Multiagent Systems, 2018).

  80. Xie, T., Jiang, N., Wang, H., Xiong, C. & Bai, Y. Policy finetuning: bridging sample-efficient offline and online reinforcement learning. Adv. Neural Inf. Process. Syst. 34, 27395–27407 (2021).

    Google Scholar 

  81. Lee, S., Seo, Y., Lee, K., Abbeel, P. & Shin, J. Offline-to-online reinforcement learning via balanced replay and pessimistic Q-ensemble. In Proc. 6th Annual Conference on Robot Learning 164, 1702–1712 (2022).

  82. Christiano, P. F. et al. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst. 30, 4302–4310 (2017).

  83. Li, G., Whiteson, S., Knox, W. B. & Hung, H. Social interaction for efficient agent learning from human reward. Auton. Agent Multi Agent Syst. 32, 1–25 (2018).

    Article  Google Scholar 

  84. Li, G., He, B., Gomez, R. & Nakamura, K. Interactive reinforcement learning from demonstration and human evaluative feedback. In Proc. 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 1156–1162 (IEEE, 2018).

  85. Arora, S. & Doshi, P. A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  86. Juan, R. et al. Shaping progressive net of reinforcement learning for policy transfer with human evaluative feedback. In Proc. IEEE International Conference on Intelligent Robots and Systems (IROS) 1281–1288 (IEEE, 2021).

  87. Li, G., Gomez, R., Nakamura, K. & He, B. Human-centered reinforcement learning: a survey. IEEE Trans. Hum. Mach. Syst. 49, 337–349 (2019).

    Article  Google Scholar 

  88. Neftci, E. O. & Averbeck, B. B. Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1, 133–143 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant 51809246) and by the Honda Research Institute Japan Co., Ltd. We especially thank L. Wang for taking the time to provide feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangliang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, H., Juan, R., Gomez, R. et al. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nat Mach Intell 4, 1077–1087 (2022). https://doi.org/10.1038/s42256-022-00573-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00573-6

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics