Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Geramifard, Alborz; Redding, Joshua; How, Jonathan P.

doi:10.1007/s10846-013-9826-6

Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Published: 13 March 2013

Volume 72, pages 83–103, (2013)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Alborz Geramifard¹,
Joshua Redding² &
Jonathan P. How¹

652 Accesses
21 Citations
Explore all metrics

Abstract

Planning for multi-agent systems such as task assignment for teams of limited-fuel unmanned aerial vehicles (UAVs) is challenging due to uncertainties in the assumed models and the very large size of the planning space. Researchers have developed fast cooperative planners based on simple models (e.g., linear and deterministic dynamics), yet inaccuracies in assumed models will impact the resulting performance. Learning techniques are capable of adapting the model and providing better policies asymptotically compared to cooperative planners, yet they often violate the safety conditions of the system due to their exploratory nature. Moreover they frequently require an impractically large number of interactions to perform well. This paper introduces the intelligent Cooperative Control Architecture (iCCA) as a framework for combining cooperative planners and reinforcement learning techniques. iCCA improves the policy of the cooperative planner, while reduces the risk and sample complexity of the learner. Empirical results in gridworld and task assignment for fuel-limited UAV domains with problem sizes up to 9 billion state-action pairs verify the advantage of iCCA over pure learning and planning strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cooperative Search Strategy of Multi-UAVs Based on Reinforcement Learning

ScaRLib: A Framework for Cooperative Many Agent Deep Reinforcement Learning in Scala

Distributed Reinforcement Learning for Robot Teams: a Review

Article 01 September 2022

References

Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1–8 (2005)
Alighanbari, M.: Task assignment algorithms for teams of UAVs in dynamic environments. Master’s thesis, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology (2004)
Alighanbari, M., Bertuccelli, L.F., How, J.P.: A robust approach to the UAV task assignment problem. In: IEEE Conference on Decision and Control (CDC), pp. 5935–5940, Dec. 13–15 (2006). doi:10.1109/CDC.2006.377012
Alighanbari, M., Kuwata, Y., How, J.P.: Coordination and control of multiple UAVs with timing constraints and loitering. In: American Control Conference (ACC), vol. 6, pp. 5311–5316 (2003). ISBN 0743-1619. http://acl.mit.edu/papers/FP13_41.pdf.
Anderson, C.W., Young, P.M., Buehnerand, M.R., Knight, J.N., Bush, K.A., Hittle, D.C.: Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks. IEEE Trans. Neural Netw. 18(4), 993–1002 (2007)
Article Google Scholar
Bagnell, A., Ng, A., Schneider, J.: Solving uncertain Markov decision processes. Advances in Neural Information Processing Systems (NIPS) (2001)
Beard, R., McLain, T., Goodrich, M., Anderson, E.: Coordinated target assignment and intercept for unmanned air vehicles. IEEE Trans. Robot. Autom. 18(6), 911–922 (2002)
Article Google Scholar
Bellman, R.: Dynamic Programming. Dover Publications (2003). ISBN 0486428095
Ben-Tal, A., Boyd, S., Nemirovski, A.: Extending scope of robust optimization: comprehensive robust counterparts of uncertain problems. Math. Program. 107(1–2), 63–89 (2006)
Article MathSciNet MATH Google Scholar
Berman, S., Halász, Á., Ani, M., Sieh, H., Kumar, V.: Optimized stochastic policies for task allocationin swarms of robots. IEEE Trans. Robot. 25(4), 927–937 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2024997.
Article Google Scholar
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Article MathSciNet MATH Google Scholar
Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (2008). http://acl.mit.edu/papers/Bertuccelli_PhD.pdf
Bertuccelli, L.F., Wu, A., How, J.P.: Robust adaptive markov decision processes. IEEE Control Syst. Mag. 32(5), 96–109 (2012)
Article MathSciNet Google Scholar
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 105–112. MIT Press (2007)
Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. (JMLR) 3, 213–231 (2001)
MathSciNet MATH Google Scholar
Casal, A.: Reconfiguration planning for modular self-reconfigurable robots. PhD thesis, Stanford University, Stanford, CA (2002)
Cassandras, C., Li, W.: A receding horizon approach for solving some cooperative control problems. In: IEEE Conference on Decision and Control (CDC), pp. 3760–3765 (2002)
Castanon, D.A., Wohletz, J.M.: Model predictive control for stochastic resource allocation. IEEE Trans. Automat. Contr. 54(8), 1739–1750 (2009)
Article MathSciNet Google Scholar
Choi, H.-L., Brunet, L., How, J.P.: Consensus-based decentralized auctions for robust task allocation. IEEE Trans. Robot. 25(4), 912–926 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2022423
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd revised edn. The MIT Press (2001). ISBN 0262531968
Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to chance constrained control. J. Artif. Intell. Res (JAIR) 24, 81–108 (2005)
MATH Google Scholar
Geramifard, A., Redding, J., Joseph, J., How, J.P.: Model estimation within planning and learning. In: Workshop on Planning and Acting with Uncertain Models, ICML, Bellevue, WA, USA (2011)
Geramifard, A., Redding, J., Roy, N., How, J.P.: UAV cooperative control with stochastic risk models. In: American Control Conference (ACC), pp. 3393–3398 (2011). http://people.csail.mit.edu/agf/Files/11ACC-iCCARisk.pdf
Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of the 16th European Symposium on Artificial Neural Networks (2008)
Heger, M.: Consideration of risk and reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 105–111 (1994)
Frazzoli, E., Savla, K., Temple, T.: Human-in-the-loop vehicle routing policies for dynamic environments. In: IEEE Conference on Decision and Control (2008)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2010)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. (JMLR) 4, 1107–1149 (2003)
MathSciNet Google Scholar
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: International Conference on Machine Learning (ICML), pp. 664–671 (2008)
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. J. Mach. Learn. Res. (JMLR) 49(2–3), 267–290 (2002). ISSN 0885-6125. doi:10.1023/A:1017940631555
Google Scholar
Nilim, A., El Ghaoui, L.: Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Article MathSciNet MATH Google Scholar
Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. IEEE Proc. 95(1), 215–233 (2007)
Article Google Scholar
Redding, J., Geramifard, A., Choi, H.-L., How, J.P.: Actor-critic policy learning in cooperative planning. In: AIAA Guidance, Navigation, and Control Conference (GNC) (2010). (AIAA-2010-7586)
Redding, J., Geramifard, A., Undurti, A., Choi, H., How, J.: An intelligent cooperative control architecture. In: American Control Conference (ACC), pp. 57–62. Baltimore, MD (2010). http://people.csail.mit.edu/agf/Files/10ACC-iCCA.pdf
Ren, W., Beard, R.W., Atkins, E.M.: Information consensus in multivehicle cooperative control. IEEE Control Syst. Mag. 27(2), 71–82 (2007). ISSN 0272-1708. doi:10.1109/MCS.2007.338264
Google Scholar
Rummery, G.A., Niranjan, M.: Online Q-learning using Connectionist Systems (tech. rep. no. cued/f-infeng/tr 166). Cambridge University Engineering Department (1994)
Ryan, A., Zennaro, M., Howell, A., Sengupta, R., Hedrick, J.K.: An overview of emerging results in cooperative UAV control. In: IEEE Conference on Decision and Control (CDC), pp. 602–607 (2004)
Saligrama, V., Castañón, D.A.: Reliable distributed estimation with intermittent communications. In: IEEE Conference on Decision and Control (CDC), pp. 6763–6768 (2006). doi:10.1109/CDC.2006.377646
Singh, A., Krause, A., Kaiser, W.: Nonmyopic adaptive informative path planning for multiple robots. In: International Joint Conference on Artificial Intelligence (IJCAI) (2009)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with obstacle/collision avoidance. IEEE Trans. Control Syst. Technol. 15(4), 672–679 (2007). doi:10.1109/TCST.2007.899191
Article Google Scholar
Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with stochastic obstacle avoidance. AIAA Guidance, Navigation, and Control Conference (GNC) (2005)
Xu, L., Ozguner, U.: Battle management for unmanned aerial vehicles. In: IEEE Conference on Decision and Control (CDC), vol. 4, pp. 3585–3590 (2003). ISBN 0191-2216
Zhu, S., Fukushima, M.: Worst-case conditional value-at-risk with application to robust portfolio management. Oper. Res. 57(5), 1155–1168 (2009). ISSN 0030-364X
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, USA
Alborz Geramifard & Jonathan P. How
VTOL Embedded Systems, Lockheed Martin Procerus Technologies, Vineyard, UT, 84058, USA
Joshua Redding

Authors

Alborz Geramifard
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Redding
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. How
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alborz Geramifard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geramifard, A., Redding, J. & How, J.P. Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning. J Intell Robot Syst 72, 83–103 (2013). https://doi.org/10.1007/s10846-013-9826-6

Download citation

Received: 15 August 2012
Accepted: 21 February 2013
Published: 13 March 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10846-013-9826-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Abstract

Access this article

Similar content being viewed by others

Cooperative Search Strategy of Multi-UAVs Based on Reinforcement Learning

ScaRLib: A Framework for Cooperative Many Agent Deep Reinforcement Learning in Scala

Distributed Reinforcement Learning for Robot Teams: a Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation