Abstract
While deep reinforcement learning (DRL) enhances the flexibility and intelligence of a single robot, it has proven challenging to solve the cooperatively of even basic tasks. And robotic manipulation is cumbersome and can easily yield getting trapped in local optima with reward shaping. As such sparse rewards are an attractive alternative. In this paper, we demonstrate how teams of robots are able to solve cooperative tasks. Additionally, we provide insights on how to facilitate exploration and faster learning in collaborative systems. First, we increased the amount of effective data samples in the replay buffer by leveraging virtual targets. Secondly, we introduce a small number of expert demonstrations to guide the robot during training via an additional loss that forces the policy network to learn the expert data faster. Finally, to improve the quality of behavior cloning, we propose a Judge mechanism that updates the strategy by selecting optimal action while training. Furthermore, our algorithms were tested in simulation using both dual arms and teams of two robots with single arms.
This work is partially supported by the Key R&D Programmes of Guangdong Province (Grant No. 2019B090915001), the Frontier and Key Technology Innovation Special Funds of Guangdong Province (Grant No. 2017B050506008), and NSFC (Grant No. 51975126, 51905105).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrychowicz, M., Wolski, F., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Bojarski, M., Del Testa, D., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Foerster, J.N., Farquhar, G., Whiteson, S., et al.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Gao, Y., Xu, H., et al.: Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313 (2018)
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agent. Multi-Agent Syst. 33(6), 750–797 (2019)
Hwangbo, J., Lee, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Lillicrap, T.P., Hunt, J.J., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Liu, I.J., Yeh, R.A., Schwing, A.G.: PIC: permutation invariant critic for multi-agent deep reinforcement learning. In: Conference on Robot Learning, pp. 590–602. PMLR (2020)
Lowe, R., Wu, Y.I., Tamar, A., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Nair, A., McGrew, B., Abbeel, P., et al.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Peng, X.B., Abbeel, P., et al.: Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG) 37(4), 143 (2018)
Rajeswaran, A., Kumar, V., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087 (2017)
Rashid, T., Samvelyan, M., et al.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR (2018)
Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. Philos. Trans. R. Soc. Lond. Ser. B: Biol. Sci. 358(1431), 537–547 (2003)
Silver, D., Schrittwieser, J., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Son, K., Kim, D., Kang, W.J., et al.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019)
Sunehag, P., Lever, G., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS, pp. 2085–2087 (2018)
Tampuu, A., Matiisen, T., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Vecerik, M., Hester, T., et al.: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817 (2017)
Vinyals, O., Babuschkin, I., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Wu, Y.H., Charoenphakdee, N., et al.: Imitation learning from imperfect demonstration. In: International Conference on Machine Learning, pp. 6818–6827. PMLR (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Li, Y., Rojas, J., Guan, Y. (2021). Leveraging Expert Demonstrations in Robot Cooperation with Multi-Agent Reinforcement Learning. In: Liu, XJ., Nie, Z., Yu, J., Xie, F., Song, R. (eds) Intelligent Robotics and Applications. ICIRA 2021. Lecture Notes in Computer Science(), vol 13014. Springer, Cham. https://doi.org/10.1007/978-3-030-89098-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-89098-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89097-1
Online ISBN: 978-3-030-89098-8
eBook Packages: Computer ScienceComputer Science (R0)