Skip to main content
Log in

Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Planning for multi-agent systems such as task assignment for teams of limited-fuel unmanned aerial vehicles (UAVs) is challenging due to uncertainties in the assumed models and the very large size of the planning space. Researchers have developed fast cooperative planners based on simple models (e.g., linear and deterministic dynamics), yet inaccuracies in assumed models will impact the resulting performance. Learning techniques are capable of adapting the model and providing better policies asymptotically compared to cooperative planners, yet they often violate the safety conditions of the system due to their exploratory nature. Moreover they frequently require an impractically large number of interactions to perform well. This paper introduces the intelligent Cooperative Control Architecture (iCCA) as a framework for combining cooperative planners and reinforcement learning techniques. iCCA improves the policy of the cooperative planner, while reduces the risk and sample complexity of the learner. Empirical results in gridworld and task assignment for fuel-limited UAV domains with problem sizes up to 9 billion state-action pairs verify the advantage of iCCA over pure learning and planning strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 1–8 (2005)

  2. Alighanbari, M.: Task assignment algorithms for teams of UAVs in dynamic environments. Master’s thesis, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology (2004)

  3. Alighanbari, M., Bertuccelli, L.F., How, J.P.: A robust approach to the UAV task assignment problem. In: IEEE Conference on Decision and Control (CDC), pp. 5935–5940, Dec. 13–15 (2006). doi:10.1109/CDC.2006.377012

  4. Alighanbari, M., Kuwata, Y., How, J.P.: Coordination and control of multiple UAVs with timing constraints and loitering. In: American Control Conference (ACC), vol. 6, pp. 5311–5316 (2003). ISBN 0743-1619. http://acl.mit.edu/papers/FP13_41.pdf.

  5. Anderson, C.W., Young, P.M., Buehnerand, M.R., Knight, J.N., Bush, K.A., Hittle, D.C.: Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks. IEEE Trans. Neural Netw. 18(4), 993–1002 (2007)

    Article  Google Scholar 

  6. Bagnell, A., Ng, A., Schneider, J.: Solving uncertain Markov decision processes. Advances in Neural Information Processing Systems (NIPS) (2001)

  7. Beard, R., McLain, T., Goodrich, M., Anderson, E.: Coordinated target assignment and intercept for unmanned air vehicles. IEEE Trans. Robot. Autom. 18(6), 911–922 (2002)

    Article  Google Scholar 

  8. Bellman, R.: Dynamic Programming. Dover Publications (2003). ISBN 0486428095

  9. Ben-Tal, A., Boyd, S., Nemirovski, A.: Extending scope of robust optimization: comprehensive robust counterparts of uncertain problems. Math. Program. 107(1–2), 63–89 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  10. Berman, S., Halász, Á., Ani, M., Sieh, H., Kumar, V.: Optimized stochastic policies for task allocationin swarms of robots. IEEE Trans. Robot. 25(4), 927–937 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2024997.

    Article  Google Scholar 

  11. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (2008). http://acl.mit.edu/papers/Bertuccelli_PhD.pdf

  13. Bertuccelli, L.F., Wu, A., How, J.P.: Robust adaptive markov decision processes. IEEE Control Syst. Mag. 32(5), 96–109 (2012)

    Article  MathSciNet  Google Scholar 

  14. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 105–112. MIT Press (2007)

  15. Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. (JMLR) 3, 213–231 (2001)

    MathSciNet  MATH  Google Scholar 

  16. Casal, A.: Reconfiguration planning for modular self-reconfigurable robots. PhD thesis, Stanford University, Stanford, CA (2002)

  17. Cassandras, C., Li, W.: A receding horizon approach for solving some cooperative control problems. In: IEEE Conference on Decision and Control (CDC), pp. 3760–3765 (2002)

  18. Castanon, D.A., Wohletz, J.M.: Model predictive control for stochastic resource allocation. IEEE Trans. Automat. Contr. 54(8), 1739–1750 (2009)

    Article  MathSciNet  Google Scholar 

  19. Choi, H.-L., Brunet, L., How, J.P.: Consensus-based decentralized auctions for robust task allocation. IEEE Trans. Robot. 25(4), 912–926 (2009). ISSN 1552-3098. doi:10.1109/TRO.2009.2022423

    Google Scholar 

  20. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd revised edn. The MIT Press (2001). ISBN 0262531968

  21. Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to chance constrained control. J. Artif. Intell. Res (JAIR) 24, 81–108 (2005)

    MATH  Google Scholar 

  22. Geramifard, A., Redding, J., Joseph, J., How, J.P.: Model estimation within planning and learning. In: Workshop on Planning and Acting with Uncertain Models, ICML, Bellevue, WA, USA (2011)

  23. Geramifard, A., Redding, J., Roy, N., How, J.P.: UAV cooperative control with stochastic risk models. In: American Control Conference (ACC), pp. 3393–3398 (2011). http://people.csail.mit.edu/agf/Files/11ACC-iCCARisk.pdf

  24. Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of the 16th European Symposium on Artificial Neural Networks (2008)

  25. Heger, M.: Consideration of risk and reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 105–111 (1994)

  26. Frazzoli, E., Savla, K., Temple, T.: Human-in-the-loop vehicle routing policies for dynamic environments. In: IEEE Conference on Decision and Control (2008)

  27. Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2010)

  28. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. (JMLR) 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  29. Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: International Conference on Machine Learning (ICML), pp. 664–671 (2008)

  30. Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. J. Mach. Learn. Res. (JMLR) 49(2–3), 267–290 (2002). ISSN 0885-6125. doi:10.1023/A:1017940631555

    Google Scholar 

  31. Nilim, A., El Ghaoui, L.: Robust solutions to Markov decision problems with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  32. Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. IEEE Proc. 95(1), 215–233 (2007)

    Article  Google Scholar 

  33. Redding, J., Geramifard, A., Choi, H.-L., How, J.P.: Actor-critic policy learning in cooperative planning. In: AIAA Guidance, Navigation, and Control Conference (GNC) (2010). (AIAA-2010-7586)

  34. Redding, J., Geramifard, A., Undurti, A., Choi, H., How, J.: An intelligent cooperative control architecture. In: American Control Conference (ACC), pp. 57–62. Baltimore, MD (2010). http://people.csail.mit.edu/agf/Files/10ACC-iCCA.pdf

  35. Ren, W., Beard, R.W., Atkins, E.M.: Information consensus in multivehicle cooperative control. IEEE Control Syst. Mag. 27(2), 71–82 (2007). ISSN 0272-1708. doi:10.1109/MCS.2007.338264

    Google Scholar 

  36. Rummery, G.A., Niranjan, M.: Online Q-learning using Connectionist Systems (tech. rep. no. cued/f-infeng/tr 166). Cambridge University Engineering Department (1994)

  37. Ryan, A., Zennaro, M., Howell, A., Sengupta, R., Hedrick, J.K.: An overview of emerging results in cooperative UAV control. In: IEEE Conference on Decision and Control (CDC), pp. 602–607 (2004)

  38. Saligrama, V., Castañón, D.A.: Reliable distributed estimation with intermittent communications. In: IEEE Conference on Decision and Control (CDC), pp. 6763–6768 (2006). doi:10.1109/CDC.2006.377646

  39. Singh, A., Krause, A., Kaiser, W.: Nonmyopic adaptive informative path planning for multiple robots. In: International Joint Conference on Artificial Intelligence (IJCAI) (2009)

  40. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)

    Google Scholar 

  41. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

  42. Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with obstacle/collision avoidance. IEEE Trans. Control Syst. Technol. 15(4), 672–679 (2007). doi:10.1109/TCST.2007.899191

    Article  Google Scholar 

  43. Wang, X., Yadav, V., Balakrishnan, S.N.: Cooperative UAV formation flying with stochastic obstacle avoidance. AIAA Guidance, Navigation, and Control Conference (GNC) (2005)

  44. Xu, L., Ozguner, U.: Battle management for unmanned aerial vehicles. In: IEEE Conference on Decision and Control (CDC), vol. 4, pp. 3585–3590 (2003). ISBN 0191-2216

  45. Zhu, S., Fukushima, M.: Worst-case conditional value-at-risk with application to robust portfolio management. Oper. Res. 57(5), 1155–1168 (2009). ISSN 0030-364X

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alborz Geramifard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geramifard, A., Redding, J. & How, J.P. Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning. J Intell Robot Syst 72, 83–103 (2013). https://doi.org/10.1007/s10846-013-9826-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-013-9826-6

Keywords

Navigation