Skip to main content
Log in

Multi-agent reinforcement learning for redundant robot control in task-space

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Task-space control needs the inverse kinematics solution or Jacobian matrix for the transformation from task space to joint space. However, they are not always available for redundant robots because there are more joint degrees-of-freedom than Cartesian degrees-of-freedom. Intelligent learning methods, such as neural networks (NN) and reinforcement learning (RL) can learn the inverse kinematics solution. However, NN needs big data and classical RL is not suitable for multi-link robots controlled in task space. In this paper, we propose a fully cooperative multi-agent reinforcement learning (MARL) to solve the kinematic problem of redundant robots. Each joint of the robot is regarded as one agent. The fully cooperative MARL uses a kinematic learning to avoid function approximators and large learning space. The convergence property of the proposed MARL is analyzed. The experimental results show that our MARL is much more better compared with the classic methods such as Jacobian-based methods and neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Task-space (or Cartesian space) is defined by the position and orientation of the end effector of a robot. Joint-space is defined by angular displacements of each joint angle of a robot.

References

  1. Ahmadi S, Fateh M (2018) Task-space asymptotic tracking control of robots using a direct adaptive Taylor series controller. J Vib Control 24(23):5570–5584. https://doi.org/10.1177/1077546318758800

    Article  MathSciNet  Google Scholar 

  2. Ansari Y, Falotico E (2016) A multiagent reinforcement learning approach for inverse kinematics oh high dimensional manipulators with precision positioning. In: 6th IEEE RAS/EMBS international conference on biomedical robotics and biomechatronics (BioRob). https://doi.org/10.1109/BIOROB.2016.7523669

  3. Atashzar S, Tavakoli M, Patel R (2018) A computational-model-based study of supervised haptics-enabled therapist-in-the-loop training for upper-limb poststroke robotic rehabilitation. IEEE/ASME Trans Mechatron 23(2):562–574. https://doi.org/10.1109/TMECH.2018.2806918

    Article  Google Scholar 

  4. Axinte D, Dong X, Palmer D, Rushworth A, Guzman S, Olarra A (2018) Miror-miniaturized robotic systems for holisticin-siturepair and maintenance works in restrained and hazardous environments. IEEE/ASME Trans Mechatron 23(2):978–981. https://doi.org/10.1109/TMECH.2018.2800285

    Article  Google Scholar 

  5. Bcsi B, Nguyen-Tuong D, Csat L, Schlkopf B, Peters J (2011) Learning inverse kinematics with structured prediction. IEEE/RSJ Int Conf Intell Robots Syst. https://doi.org/10.1109/IROS.2011.6094666

    Article  Google Scholar 

  6. Bitzer S, Howard M, Vijayakumar S (2010) Using dimensionality reduction to exploit constraints in reinforcement learning. IEEE/RSJ Int Conf Intell Robots Syst (IROS). https://doi.org/10.1109/IROS.2010.5650243

    Article  Google Scholar 

  7. Buşoniu L, Babûska R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain L (eds) Innovations in multi-agent systems and applications—1. Studies in computational intelligence. Lecture notes in computer science, vol 310. Springer, Berlin. https://doi.org/10.1007/978-3-642-14435-6_7

    Chapter  Google Scholar 

  8. Buşoniu L, Babûska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. Automation and Control Engineering Series. CRC Press, Boca Raton

    Google Scholar 

  9. Cheah C, Li X (2011) Singularity-robust task-space tracking control of robot. IEEE Int Conf Robot Autom. https://doi.org/10.1109/ICRA.2011.5979932

    Article  Google Scholar 

  10. Csistzar A, Eilers J, Verl A (2017) On solving the inverse kinematics problem using neural networks. In: 24th international conference on mechatronics and machine vision in practice. https://doi.org/10.1109/M2VIP.2017.8211457

  11. Deisenroth M, Rasmussen C (2011) PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning, Bellevue, WAA, USA

  12. Deisenroth MP, Neumann G, Peters J (2011) A survey on policy search for robotics. Found Trends Robot 2(1–2):1–142. https://doi.org/10.1561/2300000021

    Article  Google Scholar 

  13. Duka A (2014) Neural network based inverse kinematics solution for trajectory tracking of a robotic arm. In: Procedia technology, the 7th international conference interdisciplinarity in engineering, INTER-ENG 2013. Petru Maior University of Tirgu Mures, Romania. https://doi.org/10.1016/j.protcy.2013.12.451

  14. Feng Y, Yao-nan W, Yi-min Y (2012) Inverse kinematics solution for robot manipulator based on neural network under joint subspace. Int J Comput Commun Control 7(3):459–472. https://doi.org/10.15837/ijccc.2012.3.1387

    Article  Google Scholar 

  15. Galicki M (2016) Finite-time trajectory tracking control in task space of robotic manipulators. Automatica 67:165–170. https://doi.org/10.1016/j.automatica.2016.01.025

    Article  MathSciNet  MATH  Google Scholar 

  16. Galicki M (2016) Robust task space trajectory tracking control of robotic manipulators. Int J Appl Mech Eng 21(3):547–568. https://doi.org/10.1515/ijame-2016-0033

    Article  MATH  Google Scholar 

  17. Grondman I, Buşoniu L, Babûska R (2012) Model learning actor-critic algorithms: performance evaluation in a motion control task. In: 51st IEEE conference on decision and control (CDC), pp 5272–5277. https://doi.org/10.1109/CDC.2012.6426427

  18. Grondman I, Vaandrager M, Buşoniu L, Babûska R, Schuitema E (2011) Actor-critic control with reference model learning. In: Proceedings of the 18th World congress the international federation of automatic control, pp 14723–14728. https://doi.org/10.3182/20110828-6-IT-1002.00759

  19. Grondman I, Vaandrager M, Buşoniu L, Babûska R, Schuitema E (2012a) Efficient model learning methods for actor-critic control. IEEE Trans Syst Man Cybern B Cybern 42(3):291–602. https://doi.org/10.1109/TSMCB.2011.2170565

    Article  Google Scholar 

  20. Hyatt P (2019) Configuration estimation for accurate position control of large-scale soft robots. IEEE/ASME Trans Mechatron 24(1):88–99. https://doi.org/10.1109/TMECH.2018.2878228

    Article  Google Scholar 

  21. Jaakola TMJ, Singh S (1994) On the convergence of stochastic iterative dyanamic programming algorithms. Neural Comput 6(6):1185–1201. https://doi.org/10.1162/neco.1994.6.6.1185

    Article  Google Scholar 

  22. Kober J, Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1007/978-3-319-03194-1_2

    Article  Google Scholar 

  23. Lewis F, Vrable D, Vamvoudakis K (2012) Reinforcement learning and feedback control: using natural decision methods to desgin optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105. https://doi.org/10.1109/MCS.2012.2214134

    Article  MathSciNet  MATH  Google Scholar 

  24. Luya L, Gruver W, Zhang Q, Yang Z (2001) Kinematic control of redundant robots and the motion optimizability measure. IEEE Trans Syst Man Cybern Part B Cybern 31(1):155–160. https://doi.org/10.1109/3477.907575

    Article  Google Scholar 

  25. Moon Y, Seo J, Choi J (2015) Development of new end-effector for proof-of-concept of fully robotic multichannel biopsy. IEEE/ASME Trans Mechatron 20(6):2996–3008. https://doi.org/10.1109/TMECH.2015.2418793

    Article  Google Scholar 

  26. Patel R, Shadpey F (2005) Control of redundant manipulators: theory and experiments. Springer, Berlin. https://doi.org/10.1007/b93979

    Book  MATH  Google Scholar 

  27. Perrusquía A, Yu W (2020) Human-in-the-loop control using euler angles. J Intell Robot Syst 97:271–285. https://doi.org/10.1007/s10846-019-01058-2

    Article  Google Scholar 

  28. Perrusquía A, Yu W (2020) Robot position/force control in unknown environment using hybrid reinforcement learning. Cybern Syst. https://doi.org/10.1080/01969722.2020.1758466

    Article  Google Scholar 

  29. Perrusquía A, Yu W, Soria A (2019) Large space dimension reinforcement learning for robot position/force discrete control. In: 2019 6th international conference on control, decision and information technologies (CoDIT 2019), Paris, France. https://doi.org/10.1109/CoDIT.2019.8820575

  30. Perrusquía A, Yu W, Soria A (2019) Optimal contact force in unknown environments using reinforcement learning and model-free controllers. In: 16th international conference on electrical engineering, computing science and automatic control (CCE), Mexico city, Mexico. https://doi.org/10.1109/ICEEE.2019.8884518

  31. Perrusquía A, Yu W, Soria A (2019) Position/force control of robots manipulators using reinforcement learning. Ind Robot Int J Robot Res Appl 46(2):267–280. https://doi.org/10.1108/IR-10-2018-0209

    Article  Google Scholar 

  32. Perrusquiía A, Yu W (2020) Robust control under worst-case uncertainty for unknown nonlinear systems using modified reinforcement learning. Int J Robust Nonlinear Control 30(7):2920–2936. https://doi.org/10.1002/rnc.4911

    Article  MathSciNet  Google Scholar 

  33. Rolf M, Steil J (2014) Efficient exploratory learning of inverse kinematics on a bionic elephant trunk. IEEE Trans Neural Netw Learn Syst 25(6):1147–1160. https://doi.org/10.1109/TNNLS.2013.2287890

    Article  Google Scholar 

  34. Schulman J, Wolski F, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  35. Silver D, Lever G, Hess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, Beijing, China, vol 32, pp 387–395

  36. Sun K, Liu L, Qiu J, Feng G (2020) Fuzzy adaptive finite-time fault tolerant control for strict-feedback nonlinear systems. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.2965890

    Article  Google Scholar 

  37. Sun K, Qiu J, Karimi H, Fu Y (2020) Event- triggered robust fuzzy adaptive finite-time control of nonlinear systems with prescribed performance. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.2979129

    Article  Google Scholar 

  38. Sutton RAB (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  39. Tamei T, Shibata T (2009) Policy gradient learning of cooperative interaction with a robot using user’s biological signals. Int Conf Neural Inf Process (ICONIP). https://doi.org/10.1007/978-3-642-03040-6_125

    Article  Google Scholar 

  40. Theodorou E, Buchli J, Schaal S (2010) Reinforcement learning of motor skills in high dimensions: a path integral approach. IEEE Int Conf Robot Autom (ICRA). https://doi.org/10.1109/ROBOT.2010.5509336

    Article  MATH  Google Scholar 

  41. Tuong D, Peters J (2011) Learning task-space tracking control with kernels. IEEE/RSJ Int Conf Intell Robots Syst. https://doi.org/10.1109/IROS.2011.6094428

    Article  Google Scholar 

  42. Wiering MA, van Hasselt H (2007) Two novel on-policy reinforcement learning algorithms based on TD(\(\lambda\))-method. In: Proceedings of the 2007 IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL). https://doi.org/10.1109/ADPRL.2007.368200

  43. Wiering MA, van Hasselt H (2009) The QV family compared to other reinforcement learning algorithms. In: 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning. https://doi.org/10.1109/ADPRL.2009.4927532

  44. Xian B, de Queiroz M, Dawson D, Walker I (2004) Task-space tracking control of robots manipulators via quaternion feedback. IEEE Trans Robot Autom 20(1):160–167. https://doi.org/10.1109/TRA.2003.820932

    Article  Google Scholar 

  45. Yu W, Perrusquía A (2019) Simplified stable admittance control using end-effector orientations. Int J Soc Robot. https://doi.org/10.1007/s12369-019-00579-y

    Article  Google Scholar 

  46. Zhang D, Wei B (2017) On the development of learning control for robotic manipulators. Robotics. https://doi.org/10.3390/robotics6040023

    Article  Google Scholar 

  47. Zheng Y, Ma J, Wang L (2017) Consensus of hybrid multi-agent systems. IEEE Trans Neural Netw Learn Syst 29(4):1359–1365. https://doi.org/10.1109/TNNLS.2017.2651402

    Article  Google Scholar 

  48. Zhu Y, Li S, Ma J, Zheng Y (2018) Bipartite consensus in networks of agents with antagonistic interactions and quantization. IEEE Trans Circuits Syst II Express Briefs 65(12):2012–2016. https://doi.org/10.1109/TCSII.2018.2811803

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perrusquía, A., Yu, W. & Li, X. Multi-agent reinforcement learning for redundant robot control in task-space. Int. J. Mach. Learn. & Cyber. 12, 231–241 (2021). https://doi.org/10.1007/s13042-020-01167-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01167-7

Keywords

Navigation