Skip to main content

Advertisement

Log in

Modular transfer learning with transition mismatch compensation for excessive disturbance rejection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot’s control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). This issue is amplified by training-test model mismatch. In this paper, we propose a transfer reinforcement learning algorithm using Transition Mismatch Compensation (TMC), that learns an additional compensatory policy through minimizing mismatch of transitions predicted by the two dynamics models of the source and target tasks. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI). GCP is first trained over a wide array of disturbance waveforms. ODI then learns to use past states and actions of the system to predict the disturbance waveforms which are provided as input to GCP (along with the system state). We demonstrated on a pose regulation task in simulation that TMC is able to successfully reject the disturbances and stabilize the robot under an empirical model of the robot system, meanwhile improve sample efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availibility Statement

The data generated or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Abbeel P, Ng AY (2005) Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the 22nd international conference on Machine learning, pp 1–8

  2. Andrychowicz OM, Baker B, Chociej M et al (2020) Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1):3–20

    Article  Google Scholar 

  3. Antonelli G (2018) Underwater robots, vol 123. Springer

    Google Scholar 

  4. Bengio Y, Louradour J, Collobert R, et al (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48

  5. Bongard JC, Lipson H (2005) Nonlinear system identification using coevolution of models and tests. IEEE Trans Evol Comput 9(4):361–384

    Article  MATH  Google Scholar 

  6. Camacho EF, Alba CB (2013) Model predictive control. Springer Science & Business Media

  7. Caruana R (1997) Multitask learning. Machine learning 28(1):41–75

    Article  Google Scholar 

  8. Chebotar Y, Handa A, Makoviychuk V, et al (2019) Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp 8973–8979

  9. Chen WH, Ballance DJ, Gawthrop PJ et al (2000) A nonlinear disturbance observer for robotic manipulators. IEEE Trans Industr Electron 47(4):932–938

    Article  Google Scholar 

  10. Chen WH, Yang J, Guo L et al (2016) Disturbance-observer-based control and related methods-an overview. IEEE Trans Industr Electron 63(2):1083–1095

    Article  Google Scholar 

  11. Chen X, Hu J, Jin C, et al (2021) Understanding domain randomization for sim-to-real transfer. arXiv preprint arXiv:2110.03239

  12. Deisenroth M, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472

  13. Deisenroth MP, Neumann G, Peters J, et al (2013) A survey on policy search for robotics. Foundations and Trends® in Robotics 2(1–2):1–142

  14. Duan Y, Schulman J, Chen X, et al (2016) Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779

  15. Fakoor R, Chaudhari P, Soatto S, et al (2019) Meta-q-learning. arXiv preprint arXiv:1910.00125

  16. Feinberg V, Wan A, Stoica I, et al (2018) Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101

  17. Fernández DC, Hollinger GA (2016) Model predictive control for underwater robots in ocean waves. IEEE Robotics and Automation letters 2(1):88–95

    Article  Google Scholar 

  18. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp 1126–1135

  19. Fu J, Levine S, Abbeel P (2016) One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 4019–4026

  20. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, PMLR, pp 1587–1596

  21. Gao H, Cai Y (2016) Nonlinear disturbance observer-based model predictive control for a generic hypersonic vehicle. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 230(1):3–12

    Google Scholar 

  22. Gao Z (2014) On the centrality of disturbance rejection in automatic control. ISA Trans 53(4):850–857

    Article  Google Scholar 

  23. Giri F, Bai EW (2010) Block-oriented nonlinear system identification, vol 1. Springer

    Book  MATH  Google Scholar 

  24. Gleave A, Dennis M, Wild C, et al (2019) Adversarial policies: Attacking deep reinforcement learning. arXiv preprint arXiv:1905.10615

  25. Griffiths G (2002) Technology and applications of autonomous underwater vehicles, vol 2. CRC Press

    Google Scholar 

  26. Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp 1861–1870

  27. Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 AAAI Fall Symposium Series

  28. Heess N, Hunt JJ, Lillicrap TP, et al (2015) Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455

  29. Hessel M, Modayil J, Van Hasselt H, et al (2018) Rainbow: Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  30. Hester T, Stone P (2017) Intrinsically motivated model learning for developing curious robots. Artif Intell 247:170–186

    Article  MATH  Google Scholar 

  31. Huang B, Feng F, Lu C, et al (2021) Adarl: What, where, and how to adapt in transfer reinforcement learning. arXiv preprint arXiv:2107.02729

  32. Igl M, Zintgraf L, Le TA, et al (2018) Deep variational reinforcement learning for pomdps. In: International Conference on Machine Learning, PMLR, pp 2117–2126

  33. Jiang Y, Li C, Dai W, et al (2021) Monotonic robust policy optimization with model discrepancy. In: International Conference on Machine Learning, PMLR, pp 4951–4960

  34. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134

    Article  MATH  Google Scholar 

  35. Kontes GD, Scherer DD, Nisslbeck T, et al (2020) High-speed collision avoidance using deep reinforcement learning and domain randomization for autonomous vehicles. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp 1–8

  36. Koryakovskiy I, Kudruss M, Vallery H et al (2018) Model-plant mismatch compensation using reinforcement learning. IEEE Robotics and Automation Letters 3(3):2471–2477

    Article  Google Scholar 

  37. Lennart L (1999) System identification: theory for the user. PTR Prentice Hall, Upper Saddle River, NJ pp 1–14

  38. Liu Q, Yu T, Bai Y, et al (2021) A sharp analysis of model-based reinforcement learning with self-play. In: International Conference on Machine Learning, PMLR, pp 7001–7010

  39. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  40. Mnih V, Badia AP, Mirza M, et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937

  41. Nagabandi A, Clavera I, Liu S, et al (2018a) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347

  42. Nagabandi A, Kahn G, Fearing RS, et al (2018b) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: Robotics and Automation (ICRA), 2018 IEEE International Conference on, IEEE, pp 7579–7586

  43. Niu H, Hu J, Cui Z, et al (2021) Dr2l: Surfacing corner cases to robustify autonomous driving via domain randomization reinforcement learning. In: The 5th International Conference on Computer Science and Application Engineering, pp 1–8

  44. Packer C, Gao K, Kos J, et al (2018) Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282

  45. Pattanaik A, Tang Z, Liu S, et al (2017) Robust deep reinforcement learning with adversarial attacks. arXiv preprint arXiv:1712.03632

  46. Peng XB, Andrychowicz M, Zaremba W, et al (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1–8

  47. Polvara R, Patacchiola M, Hanheide M et al (2020) Sim-to-real quadrotor landing via sequential deep q-networks and domain randomization. Robotics 9(1):8

    Article  Google Scholar 

  48. Rajeswaran A, Ghotra S, Ravindran B, et al (2016) Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283

  49. Rakelly K, Zhou A, Quillen D, et al (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254

  50. Schaul T, Quan J, Antonoglou I, et al (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952

  51. Scheiderer C, Dorndorf N, Meisen T (2021) Effects of domain randomization on simulation-to-reality transfer of reinforcement learning policies for industrial robots. In: Advances in Artificial Intelligence and Applied Cognitive Computing. Springer, p 157–169

  52. Schulman J, Levine S, Abbeel P, et al (2015) Trust region policy optimization. In: International Conference on Machine Learning, pp 1889–1897

  53. Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  54. Shani G, Pineau J, Kaplow R (2013) A survey of point-based pomdp solvers. Auton Agent Multi-Agent Syst 27(1):1–51

    Article  Google Scholar 

  55. Silver D, Hubert T, Schrittwieser J, et al (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815

  56. Singh A, Yang L, Hartikainen K, et al (2019) End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854

  57. Sorokin I, Seleznev A, Pavlov M, et al (2015) Deep attention recurrent q-network. arXiv preprint arXiv:1512.01693

  58. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press

  59. Tan J, Zhang T, Coumans E, et al (2018) Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332

  60. Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(Jul):1633–1685

  61. Tessler C, Efroni Y, Mannor S (2019) Action robust reinforcement learning and applications in continuous control. In: International Conference on Machine Learning, PMLR, pp 6215–6224

  62. Wang JX, Kurth-Nelson Z, Tirumala D, et al (2016) Learning to reinforcement learn. arXiv preprint arXiv:1611.05763

  63. Wang T, Lu W, Liu D (2018a) A case study: Modeling of a passive flexible link on a floating platform for intervention tasks. In: 2018 13th World Congress on Intelligent Control and Automation (WCICA), IEEE, pp 187–193

  64. Wang T, Lu W, Liu D (2018b) Excessive disturbance rejection control of autonomous underwater vehicle using reinforcement learning. In: Australasian Conference on Robotics and Automation

  65. Wang T, Lu W, Yan Z, et al (2019a) Dob-net: Actively rejecting unknown excessive time-varying disturbances. arXiv preprint arXiv:1907.04514

  66. Wang Y, He H, Tan X (2019b) Robust reinforcement learning in pomdps with incomplete and noisy observations. arXiv preprint arXiv:1902.05795

  67. Woolfrey J, Liu D, Carmichael M (2016) Kinematic control of an autonomous underwater vehicle-manipulator system (auvms) using autoregressive prediction of vehicle motion and model predictive control. In: Robotics and Automation (ICRA), 2016 IEEE International Conference on, IEEE, pp 4591–4596

  68. Wu Y, Mansimov E, Grosse RB, et al (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Advances in neural information processing systems 30

  69. Wulfmeier M, Posner I, Abbeel P (2017) Mutual alignment transfer learning. In: Conference on Robot Learning, PMLR, pp 281–290

  70. Xie LL, Guo L (2000) How much uncertainty can be dealt with by feedback? IEEE Trans Autom Control 45(12):2203–2217

    Article  MATH  Google Scholar 

  71. Yu T, Quillen D, He Z, et al (2020) Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In: Conference on Robot Learning, PMLR, pp 1094–1100

  72. Yu T, Kumar A, Rafailov R, et al (2021) Combo: Conservative offline model-based policy optimization. Advances in Neural Information Processing Systems 34

  73. Yu W, Liu CK, Turk G (2017) Preparing for the unknown: Learning a universal policy with online system identification. In: Proceedings of Robotics: Science and Systems, Cambridge, Massachusetts, https://doi.org/10.15607/RSS.2017.XIII.048

  74. Zarchan P, Musoff H (2013) Fundamentals of Kalman filtering: a practical approach. American Institute of Aeronautics and Astronautics, Inc

  75. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp 737–744

  76. Zintgraf L, Shiarlis K, Igl M, et al (2020) Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348

Download references

Funding

This work was supported by the Robotics Institute at the University of Technology Sydney.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: TW, WL and DL; Methodology: TW and WL; Formal analysis and investigation: TW and WL; Writing - original draft preparation: TW; Writing - review and editing: WL and DL; Funding acquisition: DL; Experiments: TW, WL and HY; Supervision: Wenjie Lu and Dikai Liu.

Corresponding author

Correspondence to Wenjie Lu.

Ethics declarations

Conflict of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Lu, W., Yu, H. et al. Modular transfer learning with transition mismatch compensation for excessive disturbance rejection. Int. J. Mach. Learn. & Cyber. 14, 295–311 (2023). https://doi.org/10.1007/s13042-022-01641-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01641-4

Keywords

Navigation