Dual Action Policy for Robust Sim-to-Real Reinforcement Learning

Terence, Ng Wen Zheng; Jianda, Chen

doi:10.1007/978-3-031-72341-4_25

Ng Wen Zheng Terence^11,12 &
Chen Jianda¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15019))

Included in the following conference series:

International Conference on Artificial Neural Networks

574 Accesses

Abstract

This paper presents Dual Action Policy (DAP), a novel approach to address the dynamics mismatch inherent in the sim-to-real gap of reinforcement learning. DAP uses a single policy to predict two sets of actions: one for maximizing task rewards in simulation and another specifically for domain adaptation via reward adjustments. This decoupling makes it easier to maximize the overall reward in the source domain during training. Additionally, DAP incorporates uncertainty-based exploration during training to enhance agent robustness. Experimental results demonstrate DAP’s effectiveness in bridging the sim-to-real gap, outperforming baselines on challenging tasks in simulation, and further improvement is achieved by incorporating uncertainty estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An, G., Moon, S., Kim, J.H., Song, H.O.: Uncertainty-based offline reinforcement learning with diversified q-ensemble. NeurIPS (2021)
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015)
Google Scholar
Charpentier, A., Elie, R., Remlinger, C.: Reinforcement learning in economics and finance. Comput. Econ. (2021)
Google Scholar
Chebotar, Y., et al.: Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: ICRA (2019)
Google Scholar
Christiano, P.F., et al.: Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint (2016)
Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NeurIPS (2018)
Google Scholar
Desai, S., Durugkar, I., Karnan, H., Warnell, G., Hanna, J., Stone, P.: An imitation from observation approach to transfer learning with dynamics mismatch. NeurIPS (2020)
Google Scholar
Eysenbach, B., Chaudhari, S., Asawa, S., Levine, S., Salakhutdinov, R.: Off-dynamics reinforcement learning: training for transfer with domain classifiers. In: ICLR (2020)
Google Scholar
Ghasemipour, K., Gu, S.S., Nachum, O.: Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters. NeurIPS (2022)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: ICRA (2017)
Google Scholar
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: ICML, pp. 1352–1361. PMLR (2017)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint (2018)
Google Scholar
Hanna, J., Stone, P.: Grounded action transformation for robot learning in simulation. In: AAAI (2017)
Google Scholar
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: Model-based policy optimization. NeurIPS (2019)
Google Scholar
Jiang, Y., Zhang, T., Ho, D., Bai, Y., Liu, C.K., Levine, S., Tan, J.: Simgan: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA (2021)
Google Scholar
Karnan, H., Desai, S., Hanna, J.P., Warnell, G., Stone, P.: Reinforced grounded action transformation for sim-to-real transfer. In: IROS (2020)
Google Scholar
Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: ICLR (2018)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS (2017)
Google Scholar
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Robot. (2020)
Google Scholar
Liang, X., Shu, K., Lee, K., Abbeel, P.: Reward uncertainty for exploration in preference-based reinforcement learning. In: Deep RL Workshop NeurIPS 2021 (2021)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint (2015)
Google Scholar
Liu, S., See, K.C., Ngiam, K.Y., Celi, L.A., Sun, X., Feng, M.: Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. (2020)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature (2015)
Google Scholar
Niu, H., et al.: When to trust your simulator: dynamics-aware hybrid offline-and-online reinforcement learning. NeurIPS (2022)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. NeurIPS (2016)
Google Scholar
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: ICRA (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, MIT press (2018)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: IROS (2012)
Google Scholar
Yu, W., Tan, J., Liu, C.K., Turk, G.: Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint (2017)
Google Scholar

Download references

Acknowledgment

This study is supported under RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Ng Wen Zheng Terence & Chen Jianda
Continental Automotive, Singapore, Singapore
Ng Wen Zheng Terence

Authors

Ng Wen Zheng Terence
View author publications
You can also search for this author in PubMed Google Scholar
Chen Jianda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ng Wen Zheng Terence .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Terence, N.W.Z., Jianda, C. (2024). Dual Action Policy for Robust Sim-to-Real Reinforcement Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15019. Springer, Cham. https://doi.org/10.1007/978-3-031-72341-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-72341-4_25
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72340-7
Online ISBN: 978-3-031-72341-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dual Action Policy for Robust Sim-to-Real Reinforcement Learning