Skip to main content

Dual Action Policy for Robust Sim-to-Real Reinforcement Learning

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15019))

Included in the following conference series:

  • 574 Accesses

Abstract

This paper presents Dual Action Policy (DAP), a novel approach to address the dynamics mismatch inherent in the sim-to-real gap of reinforcement learning. DAP uses a single policy to predict two sets of actions: one for maximizing task rewards in simulation and another specifically for domain adaptation via reward adjustments. This decoupling makes it easier to maximize the overall reward in the source domain during training. Additionally, DAP incorporates uncertainty-based exploration during training to enhance agent robustness. Experimental results demonstrate DAP’s effectiveness in bridging the sim-to-real gap, outperforming baselines on challenging tasks in simulation, and further improvement is achieved by incorporating uncertainty estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. An, G., Moon, S., Kim, J.H., Song, H.O.: Uncertainty-based offline reinforcement learning with diversified q-ensemble. NeurIPS (2021)

    Google Scholar 

  2. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015)

    Google Scholar 

  3. Charpentier, A., Elie, R., Remlinger, C.: Reinforcement learning in economics and finance. Comput. Econ. (2021)

    Google Scholar 

  4. Chebotar, Y., et al.: Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: ICRA (2019)

    Google Scholar 

  5. Christiano, P.F., et al.: Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint (2016)

    Google Scholar 

  6. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NeurIPS (2018)

    Google Scholar 

  7. Desai, S., Durugkar, I., Karnan, H., Warnell, G., Hanna, J., Stone, P.: An imitation from observation approach to transfer learning with dynamics mismatch. NeurIPS (2020)

    Google Scholar 

  8. Eysenbach, B., Chaudhari, S., Asawa, S., Levine, S., Salakhutdinov, R.: Off-dynamics reinforcement learning: training for transfer with domain classifiers. In: ICLR (2020)

    Google Scholar 

  9. Ghasemipour, K., Gu, S.S., Nachum, O.: Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters. NeurIPS (2022)

    Google Scholar 

  10. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: ICRA (2017)

    Google Scholar 

  11. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: ICML, pp. 1352–1361. PMLR (2017)

    Google Scholar 

  12. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint (2018)

    Google Scholar 

  13. Hanna, J., Stone, P.: Grounded action transformation for robot learning in simulation. In: AAAI (2017)

    Google Scholar 

  14. Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: Model-based policy optimization. NeurIPS (2019)

    Google Scholar 

  15. Jiang, Y., Zhang, T., Ho, D., Bai, Y., Liu, C.K., Levine, S., Tan, J.: Simgan: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA (2021)

    Google Scholar 

  16. Karnan, H., Desai, S., Hanna, J.P., Warnell, G., Stone, P.: Reinforced grounded action transformation for sim-to-real transfer. In: IROS (2020)

    Google Scholar 

  17. Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: ICLR (2018)

    Google Scholar 

  18. Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS (2017)

    Google Scholar 

  19. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Robot. (2020)

    Google Scholar 

  20. Liang, X., Shu, K., Lee, K., Abbeel, P.: Reward uncertainty for exploration in preference-based reinforcement learning. In: Deep RL Workshop NeurIPS 2021 (2021)

    Google Scholar 

  21. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint (2015)

    Google Scholar 

  22. Liu, S., See, K.C., Ngiam, K.Y., Celi, L.A., Sun, X., Feng, M.: Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. (2020)

    Google Scholar 

  23. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature (2015)

    Google Scholar 

  24. Niu, H., et al.: When to trust your simulator: dynamics-aware hybrid offline-and-online reinforcement learning. NeurIPS (2022)

    Google Scholar 

  25. Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. NeurIPS (2016)

    Google Scholar 

  26. Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: ICRA (2018)

    Google Scholar 

  27. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, MIT press (2018)

    Google Scholar 

  28. Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: IROS (2012)

    Google Scholar 

  29. Yu, W., Tan, J., Liu, C.K., Turk, G.: Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint (2017)

    Google Scholar 

Download references

Acknowledgment

This study is supported under RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ng Wen Zheng Terence .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Terence, N.W.Z., Jianda, C. (2024). Dual Action Policy for Robust Sim-to-Real Reinforcement Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15019. Springer, Cham. https://doi.org/10.1007/978-3-031-72341-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72341-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72340-7

  • Online ISBN: 978-3-031-72341-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics