Abstract
Planning for multi-agent systems such as task assignment for teams of limited-fuel unmanned aerial vehicles (UAVs) is challenging due to uncertainties in the assumed models and the very large size of the planning space. Researchers have developed fast cooperative planners based on simple models (e.g., linear and deterministic dynamics), yet inaccuracies in assumed models will impact the resulting performance. Learning techniques are capable of adapting the model and providing better policies asymptotically compared to cooperative planners, yet they often violate the safety conditions of the system due to their exploratory nature. Moreover they frequently require an impractically large number of interactions to perform well. This paper introduces the intelligent Cooperative Control Architecture (iCCA) as a framework for combining cooperative planners and reinforcement learning techniques. iCCA improves the policy of the cooperative planner, while reduces the risk and sample complexity of the learner. Empirical results in gridworld and task assignment for fuel-limited UAV domains with problem sizes up to 9 billion state-action pairs verify the advantage of iCCA over pure learning and planning strategies.
Similar content being viewed by others
References
Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1–8 (2005)
Alighanbari, M.: Task assignment algorithms for teams of UAVs in dynamic environments. Master’s thesis, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology (2004)
Alighanbari, M., Bertuccelli, L.F., How, J.P.: A robust approach to the UAV task assignment problem. In: IEEE Conference on Decision and Control (CDC), pp. 5935–5940, Dec. 13–15 (2006). doi:10.1109/CDC.2006.377012
Alighanbari, M., Kuwata, Y., How, J.P.: Coordination and control of multiple UAVs with timing constraints and loitering. In: American Control Conference (ACC), vol. 6, pp. 5311–5316 (2003). ISBN 0743-1619. http://acl.mit.edu/papers/FP13_41.pdf.
Anderson, C.W., Young, P.M., Buehnerand, M.R., Knight, J.N., Bush, K.A., Hittle, D.C.: Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks. IEEE Trans. Neural Netw. 18(4), 993–1002 (2007)
Bagnell, A., Ng, A., Schneider, J.: Solving uncertain Markov decision processes. Advances in Neural Information Processing Systems (NIPS) (2001)
Beard, R., McLain, T., Goodrich, M., Anderson, E.: Coordinated target assignment and intercept for unmanned air vehicles. IEEE Trans. Robot. Autom. 18(6), 911–922 (2002)
Bellman, R.: Dynamic Programming. Dover Publications (2003). ISBN 0486428095
Ben-Tal, A., Boyd, S., Nemirovski, A.: Extending scope of robust optimization: comprehensive robust counterparts of uncertain problems. Math. Program. 107(1–2), 63–89 (2006)
Berman, S., Halász, Á., Ani, M., Sieh, H., Kumar, V.: Optimized stochastic policies for task allocationin swarms of robots. IEEE Trans. Robot. 25(4), 927–937 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2024997.
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (2008). http://acl.mit.edu/papers/Bertuccelli_PhD.pdf
Bertuccelli, L.F., Wu, A., How, J.P.: Robust adaptive markov decision processes. IEEE Control Syst. Mag. 32(5), 96–109 (2012)
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 105–112. MIT Press (2007)
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. (JMLR) 3, 213–231 (2001)
Casal, A.: Reconfiguration planning for modular self-reconfigurable robots. PhD thesis, Stanford University, Stanford, CA (2002)
Cassandras, C., Li, W.: A receding horizon approach for solving some cooperative control problems. In: IEEE Conference on Decision and Control (CDC), pp. 3760–3765 (2002)
Castanon, D.A., Wohletz, J.M.: Model predictive control for stochastic resource allocation. IEEE Trans. Automat. Contr. 54(8), 1739–1750 (2009)
Choi, H.-L., Brunet, L., How, J.P.: Consensus-based decentralized auctions for robust task allocation. IEEE Trans. Robot. 25(4), 912–926 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2022423
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd revised edn. The MIT Press (2001). ISBN 0262531968
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to chance constrained control. J. Artif. Intell. Res (JAIR) 24, 81–108 (2005)
Geramifard, A., Redding, J., Joseph, J., How, J.P.: Model estimation within planning and learning. In: Workshop on Planning and Acting with Uncertain Models, ICML, Bellevue, WA, USA (2011)
Geramifard, A., Redding, J., Roy, N., How, J.P.: UAV cooperative control with stochastic risk models. In: American Control Conference (ACC), pp. 3393–3398 (2011). http://people.csail.mit.edu/agf/Files/11ACC-iCCARisk.pdf
Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of the 16th European Symposium on Artificial Neural Networks (2008)
Heger, M.: Consideration of risk and reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 105–111 (1994)
Frazzoli, E., Savla, K., Temple, T.: Human-in-the-loop vehicle routing policies for dynamic environments. In: IEEE Conference on Decision and Control (2008)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2010)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. (JMLR) 4, 1107–1149 (2003)
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: International Conference on Machine Learning (ICML), pp. 664–671 (2008)
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. J. Mach. Learn. Res. (JMLR) 49(2–3), 267–290 (2002). ISSN 0885-6125. doi:10.1023/A:1017940631555
Nilim, A., El Ghaoui, L.: Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. IEEE Proc. 95(1), 215–233 (2007)
Redding, J., Geramifard, A., Choi, H.-L., How, J.P.: Actor-critic policy learning in cooperative planning. In: AIAA Guidance, Navigation, and Control Conference (GNC) (2010). (AIAA-2010-7586)
Redding, J., Geramifard, A., Undurti, A., Choi, H., How, J.: An intelligent cooperative control architecture. In: American Control Conference (ACC), pp. 57–62. Baltimore, MD (2010). http://people.csail.mit.edu/agf/Files/10ACC-iCCA.pdf
Ren, W., Beard, R.W., Atkins, E.M.: Information consensus in multivehicle cooperative control. IEEE Control Syst. Mag. 27(2), 71–82 (2007). ISSN 0272-1708. doi:10.1109/MCS.2007.338264
Rummery, G.A., Niranjan, M.: Online Q-learning using Connectionist Systems (tech. rep. no. cued/f-infeng/tr 166). Cambridge University Engineering Department (1994)
Ryan, A., Zennaro, M., Howell, A., Sengupta, R., Hedrick, J.K.: An overview of emerging results in cooperative UAV control. In: IEEE Conference on Decision and Control (CDC), pp. 602–607 (2004)
Saligrama, V., Castañón, D.A.: Reliable distributed estimation with intermittent communications. In: IEEE Conference on Decision and Control (CDC), pp. 6763–6768 (2006). doi:10.1109/CDC.2006.377646
Singh, A., Krause, A., Kaiser, W.: Nonmyopic adaptive informative path planning for multiple robots. In: International Joint Conference on Artificial Intelligence (IJCAI) (2009)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with obstacle/collision avoidance. IEEE Trans. Control Syst. Technol. 15(4), 672–679 (2007). doi:10.1109/TCST.2007.899191
Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with stochastic obstacle avoidance. AIAA Guidance, Navigation, and Control Conference (GNC) (2005)
Xu, L., Ozguner, U.: Battle management for unmanned aerial vehicles. In: IEEE Conference on Decision and Control (CDC), vol. 4, pp. 3585–3590 (2003). ISBN 0191-2216
Zhu, S., Fukushima, M.: Worst-case conditional value-at-risk with application to robust portfolio management. Oper. Res. 57(5), 1155–1168 (2009). ISSN 0030-364X
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Geramifard, A., Redding, J. & How, J.P. Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning. J Intell Robot Syst 72, 83–103 (2013). https://doi.org/10.1007/s10846-013-9826-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-013-9826-6