Abstract
Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot’s control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). This issue is amplified by training-test model mismatch. In this paper, we propose a transfer reinforcement learning algorithm using Transition Mismatch Compensation (TMC), that learns an additional compensatory policy through minimizing mismatch of transitions predicted by the two dynamics models of the source and target tasks. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI). GCP is first trained over a wide array of disturbance waveforms. ODI then learns to use past states and actions of the system to predict the disturbance waveforms which are provided as input to GCP (along with the system state). We demonstrated on a pose regulation task in simulation that TMC is able to successfully reject the disturbances and stabilize the robot under an empirical model of the robot system, meanwhile improve sample efficiency.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availibility Statement
The data generated or analysed during the current study are available from the corresponding author on reasonable request.
References
Abbeel P, Ng AY (2005) Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the 22nd international conference on Machine learning, pp 1–8
Andrychowicz OM, Baker B, Chociej M et al (2020) Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1):3–20
Antonelli G (2018) Underwater robots, vol 123. Springer
Bengio Y, Louradour J, Collobert R, et al (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48
Bongard JC, Lipson H (2005) Nonlinear system identification using coevolution of models and tests. IEEE Trans Evol Comput 9(4):361–384
Camacho EF, Alba CB (2013) Model predictive control. Springer Science & Business Media
Caruana R (1997) Multitask learning. Machine learning 28(1):41–75
Chebotar Y, Handa A, Makoviychuk V, et al (2019) Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp 8973–8979
Chen WH, Ballance DJ, Gawthrop PJ et al (2000) A nonlinear disturbance observer for robotic manipulators. IEEE Trans Industr Electron 47(4):932–938
Chen WH, Yang J, Guo L et al (2016) Disturbance-observer-based control and related methods-an overview. IEEE Trans Industr Electron 63(2):1083–1095
Chen X, Hu J, Jin C, et al (2021) Understanding domain randomization for sim-to-real transfer. arXiv preprint arXiv:2110.03239
Deisenroth M, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472
Deisenroth MP, Neumann G, Peters J, et al (2013) A survey on policy search for robotics. Foundations and Trends® in Robotics 2(1–2):1–142
Duan Y, Schulman J, Chen X, et al (2016) Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779
Fakoor R, Chaudhari P, Soatto S, et al (2019) Meta-q-learning. arXiv preprint arXiv:1910.00125
Feinberg V, Wan A, Stoica I, et al (2018) Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101
Fernández DC, Hollinger GA (2016) Model predictive control for underwater robots in ocean waves. IEEE Robotics and Automation letters 2(1):88–95
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 1126–1135
Fu J, Levine S, Abbeel P (2016) One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 4019–4026
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, PMLR, pp 1587–1596
Gao H, Cai Y (2016) Nonlinear disturbance observer-based model predictive control for a generic hypersonic vehicle. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 230(1):3–12
Gao Z (2014) On the centrality of disturbance rejection in automatic control. ISA Trans 53(4):850–857
Giri F, Bai EW (2010) Block-oriented nonlinear system identification, vol 1. Springer
Gleave A, Dennis M, Wild C, et al (2019) Adversarial policies: Attacking deep reinforcement learning. arXiv preprint arXiv:1905.10615
Griffiths G (2002) Technology and applications of autonomous underwater vehicles, vol 2. CRC Press
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp 1861–1870
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 AAAI Fall Symposium Series
Heess N, Hunt JJ, Lillicrap TP, et al (2015) Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455
Hessel M, Modayil J, Van Hasselt H, et al (2018) Rainbow: Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Hester T, Stone P (2017) Intrinsically motivated model learning for developing curious robots. Artif Intell 247:170–186
Huang B, Feng F, Lu C, et al (2021) Adarl: What, where, and how to adapt in transfer reinforcement learning. arXiv preprint arXiv:2107.02729
Igl M, Zintgraf L, Le TA, et al (2018) Deep variational reinforcement learning for pomdps. In: International Conference on Machine Learning, PMLR, pp 2117–2126
Jiang Y, Li C, Dai W, et al (2021) Monotonic robust policy optimization with model discrepancy. In: International Conference on Machine Learning, PMLR, pp 4951–4960
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Kontes GD, Scherer DD, Nisslbeck T, et al (2020) High-speed collision avoidance using deep reinforcement learning and domain randomization for autonomous vehicles. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp 1–8
Koryakovskiy I, Kudruss M, Vallery H et al (2018) Model-plant mismatch compensation using reinforcement learning. IEEE Robotics and Automation Letters 3(3):2471–2477
Lennart L (1999) System identification: theory for the user. PTR Prentice Hall, Upper Saddle River, NJ pp 1–14
Liu Q, Yu T, Bai Y, et al (2021) A sharp analysis of model-based reinforcement learning with self-play. In: International Conference on Machine Learning, PMLR, pp 7001–7010
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Mnih V, Badia AP, Mirza M, et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Nagabandi A, Clavera I, Liu S, et al (2018a) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347
Nagabandi A, Kahn G, Fearing RS, et al (2018b) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: Robotics and Automation (ICRA), 2018 IEEE International Conference on, IEEE, pp 7579–7586
Niu H, Hu J, Cui Z, et al (2021) Dr2l: Surfacing corner cases to robustify autonomous driving via domain randomization reinforcement learning. In: The 5th International Conference on Computer Science and Application Engineering, pp 1–8
Packer C, Gao K, Kos J, et al (2018) Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282
Pattanaik A, Tang Z, Liu S, et al (2017) Robust deep reinforcement learning with adversarial attacks. arXiv preprint arXiv:1712.03632
Peng XB, Andrychowicz M, Zaremba W, et al (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1–8
Polvara R, Patacchiola M, Hanheide M et al (2020) Sim-to-real quadrotor landing via sequential deep q-networks and domain randomization. Robotics 9(1):8
Rajeswaran A, Ghotra S, Ravindran B, et al (2016) Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283
Rakelly K, Zhou A, Quillen D, et al (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254
Schaul T, Quan J, Antonoglou I, et al (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Scheiderer C, Dorndorf N, Meisen T (2021) Effects of domain randomization on simulation-to-reality transfer of reinforcement learning policies for industrial robots. In: Advances in Artificial Intelligence and Applied Cognitive Computing. Springer, p 157–169
Schulman J, Levine S, Abbeel P, et al (2015) Trust region policy optimization. In: International Conference on Machine Learning, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Shani G, Pineau J, Kaplow R (2013) A survey of point-based pomdp solvers. Auton Agent Multi-Agent Syst 27(1):1–51
Silver D, Hubert T, Schrittwieser J, et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
Singh A, Yang L, Hartikainen K, et al (2019) End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854
Sorokin I, Seleznev A, Pavlov M, et al (2015) Deep attention recurrent q-network. arXiv preprint arXiv:1512.01693
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press
Tan J, Zhang T, Coumans E, et al (2018) Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(Jul):1633–1685
Tessler C, Efroni Y, Mannor S (2019) Action robust reinforcement learning and applications in continuous control. In: International Conference on Machine Learning, PMLR, pp 6215–6224
Wang JX, Kurth-Nelson Z, Tirumala D, et al (2016) Learning to reinforcement learn. arXiv preprint arXiv:1611.05763
Wang T, Lu W, Liu D (2018a) A case study: Modeling of a passive flexible link on a floating platform for intervention tasks. In: 2018 13th World Congress on Intelligent Control and Automation (WCICA), IEEE, pp 187–193
Wang T, Lu W, Liu D (2018b) Excessive disturbance rejection control of autonomous underwater vehicle using reinforcement learning. In: Australasian Conference on Robotics and Automation
Wang T, Lu W, Yan Z, et al (2019a) Dob-net: Actively rejecting unknown excessive time-varying disturbances. arXiv preprint arXiv:1907.04514
Wang Y, He H, Tan X (2019b) Robust reinforcement learning in pomdps with incomplete and noisy observations. arXiv preprint arXiv:1902.05795
Woolfrey J, Liu D, Carmichael M (2016) Kinematic control of an autonomous underwater vehicle-manipulator system (auvms) using autoregressive prediction of vehicle motion and model predictive control. In: Robotics and Automation (ICRA), 2016 IEEE International Conference on, IEEE, pp 4591–4596
Wu Y, Mansimov E, Grosse RB, et al (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Advances in neural information processing systems 30
Wulfmeier M, Posner I, Abbeel P (2017) Mutual alignment transfer learning. In: Conference on Robot Learning, PMLR, pp 281–290
Xie LL, Guo L (2000) How much uncertainty can be dealt with by feedback? IEEE Trans Autom Control 45(12):2203–2217
Yu T, Quillen D, He Z, et al (2020) Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In: Conference on Robot Learning, PMLR, pp 1094–1100
Yu T, Kumar A, Rafailov R, et al (2021) Combo: Conservative offline model-based policy optimization. Advances in Neural Information Processing Systems 34
Yu W, Liu CK, Turk G (2017) Preparing for the unknown: Learning a universal policy with online system identification. In: Proceedings of Robotics: Science and Systems, Cambridge, Massachusetts, https://doi.org/10.15607/RSS.2017.XIII.048
Zarchan P, Musoff H (2013) Fundamentals of Kalman filtering: a practical approach. American Institute of Aeronautics and Astronautics, Inc
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp 737–744
Zintgraf L, Shiarlis K, Igl M, et al (2020) Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348
Funding
This work was supported by the Robotics Institute at the University of Technology Sydney.
Author information
Authors and Affiliations
Contributions
Conceptualization: TW, WL and DL; Methodology: TW and WL; Formal analysis and investigation: TW and WL; Writing - original draft preparation: TW; Writing - review and editing: WL and DL; Funding acquisition: DL; Experiments: TW, WL and HY; Supervision: Wenjie Lu and Dikai Liu.
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, T., Lu, W., Yu, H. et al. Modular transfer learning with transition mismatch compensation for excessive disturbance rejection. Int. J. Mach. Learn. & Cyber. 14, 295–311 (2023). https://doi.org/10.1007/s13042-022-01641-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01641-4