Abstract
This paper presents Dual Action Policy (DAP), a novel approach to address the dynamics mismatch inherent in the sim-to-real gap of reinforcement learning. DAP uses a single policy to predict two sets of actions: one for maximizing task rewards in simulation and another specifically for domain adaptation via reward adjustments. This decoupling makes it easier to maximize the overall reward in the source domain during training. Additionally, DAP incorporates uncertainty-based exploration during training to enhance agent robustness. Experimental results demonstrate DAP’s effectiveness in bridging the sim-to-real gap, outperforming baselines on challenging tasks in simulation, and further improvement is achieved by incorporating uncertainty estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
An, G., Moon, S., Kim, J.H., Song, H.O.: Uncertainty-based offline reinforcement learning with diversified q-ensemble. NeurIPS (2021)
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015)
Charpentier, A., Elie, R., Remlinger, C.: Reinforcement learning in economics and finance. Comput. Econ. (2021)
Chebotar, Y., et al.: Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: ICRA (2019)
Christiano, P.F., et al.: Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint (2016)
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NeurIPS (2018)
Desai, S., Durugkar, I., Karnan, H., Warnell, G., Hanna, J., Stone, P.: An imitation from observation approach to transfer learning with dynamics mismatch. NeurIPS (2020)
Eysenbach, B., Chaudhari, S., Asawa, S., Levine, S., Salakhutdinov, R.: Off-dynamics reinforcement learning: training for transfer with domain classifiers. In: ICLR (2020)
Ghasemipour, K., Gu, S.S., Nachum, O.: Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters. NeurIPS (2022)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: ICRA (2017)
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: ICML, pp. 1352–1361. PMLR (2017)
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint (2018)
Hanna, J., Stone, P.: Grounded action transformation for robot learning in simulation. In: AAAI (2017)
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: Model-based policy optimization. NeurIPS (2019)
Jiang, Y., Zhang, T., Ho, D., Bai, Y., Liu, C.K., Levine, S., Tan, J.: Simgan: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA (2021)
Karnan, H., Desai, S., Hanna, J.P., Warnell, G., Stone, P.: Reinforced grounded action transformation for sim-to-real transfer. In: IROS (2020)
Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: ICLR (2018)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. NeurIPS (2017)
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Sci. Robot. (2020)
Liang, X., Shu, K., Lee, K., Abbeel, P.: Reward uncertainty for exploration in preference-based reinforcement learning. In: Deep RL Workshop NeurIPS 2021 (2021)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint (2015)
Liu, S., See, K.C., Ngiam, K.Y., Celi, L.A., Sun, X., Feng, M.: Reinforcement learning for clinical decision support in critical care: comprehensive review. J. Med. Internet Res. (2020)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature (2015)
Niu, H., et al.: When to trust your simulator: dynamics-aware hybrid offline-and-online reinforcement learning. NeurIPS (2022)
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. NeurIPS (2016)
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: ICRA (2018)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, MIT press (2018)
Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: IROS (2012)
Yu, W., Tan, J., Liu, C.K., Turk, G.: Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint (2017)
Acknowledgment
This study is supported under RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Terence, N.W.Z., Jianda, C. (2024). Dual Action Policy for Robust Sim-to-Real Reinforcement Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15019. Springer, Cham. https://doi.org/10.1007/978-3-031-72341-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-72341-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72340-7
Online ISBN: 978-3-031-72341-4
eBook Packages: Computer ScienceComputer Science (R0)