Abstract
The target encirclement control of multi-robot systems via deep reinforcement learning has been investigated in this paper. Inspired by the encirclement behavior of dolphins to entrap the fishes, the encirclement control is mainly to enforce the robots to achieve a capturing formation pattern around a target, and can be widely applied in many areas such as coverage, patrolling, escorting, etc. Different from traditional methods, we propose a deep reinforcement learning framework for multi-robot target encirclement formation control, combining the advantages of the deep neural network and deterministic policy gradient algorithm, which is free from the complicated work of building the control model and designing the control law. Our method provides a distributed control architecture for each robot in continuous action space, relying only on local teammate information. Besides, the behavioral output at each time step is determined by its own independent network. In addition, both the robots and the moving target can be trained simultaneously. In that way, both cooperation and competition can be contained, and the results validate the effectiveness of the proposed algorithm.
Similar content being viewed by others
References
Aguilar-Ponce, R., Kumar, A., Tecpanecatl-Xihuitl, J.L., Bayoumi, M.: A network of sensor-based framework for automated visual surveillance. J. Netw. Comput. Appl. 30(3), 1244–1271 (2007)
Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Trans. Neural Netw. 15(3), 639–652 (2004)
Bustamante, A.L., Molina, J.M., Patricio, M.A.: A practical approach for active camera coordination based on a fusion-driven multi-agent system. Int. J. Syst. Sci. 45(4), 741–755 (2014)
Chen, F., Ren, W., Cao, Y.: Surrounding control in cooperative agent networks. Syst. Control Lett. 59 (11), 704–712 (2010)
Degris, T., White, M., Sutton, R.S.: Off-policy actor-critic. In 29th International Conference on Machine Learning (2012)
Farinelli, A., Iocchi, L., Nardi, D.: Distributed on-line dynamic task assignment for multi-robot patrolling. Auton. Robot. 41(6), 1–25 (2016)
Finke, J., Passino, K.M., Sparks, A.G.: Stable task load balancing strategies for cooperative control of networked autonomous air vehicles. IEEE Trans. Control Syst. Technol. 14(5), 789–803 (2006)
Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv:1705.08926 (2017)
Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P.H., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. arXiv:1702.08887 (2017)
Franchi, A., Stegagno, P., Oriolo, G.: Decentralized multi-robot encirclement of a 3d target with guaranteed collision avoidance. Auton. Robot. 40(2), 1–21 (2016)
Hafez, A.T., Iskandarani, M., Givigi, S.N., Yousefi, S., Beaulieu, A.: Uavs in formation and dynamic encirclement via model predictive control. IFAC Proc. 47(3), 1241–1246 (2014)
Hausman, K., Mueller, J., Hariharan, A.: Cooperative multi-robot control for target tracking with onboard sensing. Int. J. Robot. Res. 34, (2015)
He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.Y.: Dual Learning for Machine Translation. In: Advances in Neural Information Processing Systems, pp. 820–828 (2016)
He, H., Boyd-Graber, J., Kwok, K., Daumé, H., III: Opponent Modeling in Deep Reinforcement Learning. In: International Conference on Machine Learning, pp. 1804–1813 (2016)
Iida, S., Kanoh, M., Kato, S., Itoh, H.: Reinforcement Learning for Motion Control of Humanoid Robots. In: Ieee/Rsj International Conference on Intelligent Robots and Systems, vol.4, pp. 3153–3157 (2004)
Kim, T., Hara, S., Hori, Y.: Cooperative control of multi-agent dynamical systems in target-enclosing operations using cyclic pursuit strategy. Int. J. Control. 83(10), 2040–2052 (2010)
Lan, Y., Yan, G., Lin, Z.: Distributed control of cooperative target enclosing based on reachability and invariance analysis. Syst. Control Lett. 59(7), 381–389 (2010)
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 464–473. International Foundation for Autonomous Agents and Multiagent Systems (2017)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Liu, L., Luo, C., Shen, F.: Multi-Agent Formation Control with Target Tracking and Navigation. In: IEEE International Conference on Information and Automation (2017)
Long, P., Fanl, T., Liao, X., Liu, W., Zhang, H., Pan, J.: Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. IEEE (2018)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Macwan, A., Vilela, J., Nejat, G., Benhabib, B.: A multirobot path-planning strategy for autonomous wilderness search and rescue. IEEE Trans. Cybern. 45(9), 1784–1797 (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. arXiv:1703.06182 (2017)
Parrish, J.K., Viscido, S.V., Grunbaum, D.: Self-organized fish schools: an examination of emergent properties. Biol. Bullet. 202(3), 296–305 (2002)
Sarwal, A., Agrawal, D., Chaudhary, S.: Surveillance in an Open Environment by Co-Operative Tracking Amongst Sensor Enabled Robots. In: 2007. ICIA’07. International Conference On Information Acquisition, pp. 345–349. IEEE (2007)
Sato, K., Maeda, N.: Target-Enclosing Strategies for Multi-Agent Using Adaptive Control Strategy. In: IEEE International Conference on Control Applications, pp. 1761–1766 (2010)
Shi, Y.J., Li, R., Teo, K.L.: Rotary enclosing control of second-order multi-agent systems for a group of targets. Int. J. Syst. Sci. 48, 13–21 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Su, P.H., Gasic, M., Mrksic, N., Rojas-Barahona, L., Ultes, S., Vandyke, D., Wen, T.H., Young, S.: On-line active reward learning for policy optimisation in spoken dialogue systems. arXiv:1605.07669 (2016)
Sukhbaatar, S., Fergus, R., et al.: Learning Multiagent Communication with Backpropagation. In: Advances in Neural Information Processing Systems, pp. 2244–2252 (2016)
Sutton, R., Barto, A.: Reinforcement Learning:An introduction. MIT Press, Cambridge (1998)
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., Vicente, R.: Multiagent cooperation and competition with deep reinforcement learning, vol. 12 (2017)
Wang, C., Xie, G., Cao, M.: Forming circle formations of anonymous mobile agents with order preservation. IEEE Trans. Autom. Control 58(12), 3248–3254 (2013)
Wang, C., Xie, G., Cao, M.: Controlling anonymous mobile agents with unidirectional locomotion to form formations on a circle. Automatica 50(4), 1100–1108 (2014)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)
Xiao, J., Xiong, D., Yao, W., Yu, Q., Lu, H., Zheng, Z.: Building Software System and Simulation Environment for RoboCup MSL Soccer Robots Based on ROS and Gazebo. Springer International Publishing (2017)
Yao, W., Lu, H., Zeng, Z., Xiao, J., Zheng, Z.: Distributed static and dynamic circumnavigation control with arbitrary spacings for a heterogeneous multi-robot system. Journal of Intelligent & Robotic Systems 94, 883–905 (2019)
Zhang, Y., Parker, L.E.: Multi-Robot Task Scheduling. In: IEEE International Conference on Robotics and Automation, pp. 2992–2998 (2016)
Zheng, Y., Luo, S., Lv, Z.: Control Double Inverted Pendulum by Reinforcement Learning with Double Cmac Network. In: International Conference on Pattern Recognition, pp. 639–642 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Junchong Ma and Huimin Lu contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
(MP4 20.7 MB)
Rights and permissions
About this article
Cite this article
Ma, J., Lu, H., Xiao, J. et al. Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning. J Intell Robot Syst 99, 371–386 (2020). https://doi.org/10.1007/s10846-019-01106-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-019-01106-x