Skip to main content
Log in

Distributed Learning for Planning Under Uncertainty Problems with Heterogeneous Teams

Scaling Up the Multiagent Planning with Distributed Learning and Approximate Representations

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

This paper considers the problem of multiagent sequential decision making under uncertainty and incomplete knowledge of the state transition model. A distributed learning framework, where each agent learns an individual model and shares the results with the team, is proposed. The challenges associated with this approach include choosing the model representation for each agent and how to effectively share these representations under limited communication. A decentralized extension of the model learning scheme based on the Incremental Feature Dependency Discovery (Dec-iFDD) is presented to address the distributed learning problem. The representation selection problem is solved by leveraging iFDD’s property of adjusting the model complexity based on the observed data. The model sharing problem is addressed by having each agent rank the features of their representation based on the model reduction error and broadcast the most relevant features to their teammates. The algorithm is tested on the multi-agent block building and the persistent search and track missions. The results show that the proposed distributed learning scheme is particularly useful in heterogeneous learning setting, where each agent learns significantly different models. We show through large-scale planning under uncertainty simulations and flight experiments with state-dependent actuator and fuel-burn- rate uncertainty that our planning approach can outperform planners that do not account for heterogeneity between agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific (2005)

  2. Bethke, B., Bertuccelli, L.F., How, J.P.: Experimental demonstration of adaptive MDP-based planning with model uncertainty. In: AIAA Guidance Navigation and Control. Honolulu, Hawaii (2008)

  3. Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. EEE Trans. Syst. Man Cyber. Part C Appl. Rev. I 38(2), 156–172 (2008)

    Article  Google Scholar 

  4. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press (2010)

  5. Choi, H.L,, Brunet, L., How, J.P.: Consensus-based decentralized auctions for robust task allocation. IEEE Trans. Robot. 25(4), 912–926 (2009). doi:10.1109/TRO.2009.2022423

    Article  Google Scholar 

  6. Djuric, P., Wang, Y.: Distributed bayesian learning in multiagent systems: improving our understanding of its capabilities and limitations. IEEE Signal Process. Mag. 29(2), 65–76 (2012). doi:10.1109/MSP.2011.943495

    Article  Google Scholar 

  7. Geramifard, A., Doshi, F., Redding, J., Roy, N., How, J.: Online discovery of feature dependencies. In: Getoor, L., Scheffer, T. (eds.) International Conference on Machine Learning (ICML), pp. 881–888. ACM (2011)

  8. How, J.P., Bethke, B., Frank, A., Dale, D., Vian, J.: Real-time indoor autonomous vehicle test environment. IEEE Control Syst. Mag. 28(2), 51–64 (2008)

    Article  MathSciNet  Google Scholar 

  9. Krishnamurthy, V.: Quickest time detection and constrained optimal social learning with variance penalty. In: 49th IEEE Conference on Decision and Control (CDC), pp. 1102–1107. IEEE (2010)

  10. Kushner, H.J., Yin, G.G.: Convergence of indirect adaptive asynchronous value iteration algorithms. Springer (2003)

  11. LaValle, S.: Planning Algorithms. Cambridge University Press (2006)

  12. MacKenzie, D.C., Arkin, R., Cameron, J.M.: Multiagent mission specification and execution. Auton. Robot. 4(1), 29–52 (1997)

    Article  Google Scholar 

  13. Monostori, L., Váncza, J., Kumara, S.R.: Agent-based systems for manufacturing. CIRP Annals-Manufacturing Technology 55(2), 697–720 (2006)

    Article  Google Scholar 

  14. Painter-Wakefield, C., Parr, R.: Greedy algorithms for sparse reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 968–975. ACM (2012)

  15. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  16. Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005)

    Article  Google Scholar 

  17. Powell, W.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, pp. 225–262. Wiley-Interscience (2007)

  18. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol. 414. Wiley (2009)

  19. Redding, J.D.: Approximate multi-agent planning in dynamic and uncertain environments. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (2012)

  20. Redding, J.D., Toksoz, T., Ure, N.K., Geramifard, A., How, J.P., Vavrina, M., Vian, J.: Persistent distributed multi-agent missions with automated battery management. In: AIAA Guidance, Navigation, and Control Conference (GNC), (AIAA-2011-6480) (2011)

  21. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ (2003)

  22. Sutton, R., Barto, A.: Reinforcement Learning, an Introduction. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  23. Sutton, R., Szepesvári, C., Geramifard, A., Bowling, M.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland (2008)

  24. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press (2005)

  25. Toksoz, T.: Design and implementation of an automated battery management platform. Master’s thesis, Massachusetts Institute of Technology (2012)

  26. Ure, N.K., Chowdhary, G., Redding, J., Toksoz, T., How, J., Vavrina, M., Vian, J.: Experimental demonstration of efficient multi-agent learning and planning for persistent missions in uncertain environments. In: Conference on Guidance Navigation and Control. AIAA, Minneapolis, MN (2012)

  27. Ure, N.K., Geramifard, A., Chowdhary, G., How, J.P.: Adaptive planning for Markov decision processes with uncertain transition models via incremental feature dependency discovery. In: European Conference on Machine Learning (ECML). http://acl.mit.edu/papers/Ure12ECML.pdf (2012)

  28. Ure, N.K., Chowdhary, G., Chen, Y.F., How, J.P., Vian. J.: Decentralized learning based planning multiagent missions in presence of actuator failures. In: International Conference on Unmanned Aircraft Systems. IEEE, Atlanta GA (2013)

  29. Ure, N.K., Chowdhary, G., Chen, Y.F., How, J.P., Vian, J.: Health-aware decentralized planning and learning for large-scale multiagent missions. In: Conference on Guidance Navigation and Control. AIAA, Washington DC (2013)

  30. Ure, N.K., Chowdhary, G., How, J.P., Vavarina, M., Vian, J.: Health aware planning under uncertainty for uav missions with heterogeneous teams. In: Proceedings of the European Control Conference. Zurich, Switzerland (2013) (to appear)

  31. Weibull, J.W.: Evolutionary Game Theory. MIT Press (1997)

  32. Yao, H., Sutton, R.S., Bhatnagar, S., Dongcui, D., Szepesvári, C.: Multi-step dynamic planning for policy evaluation and control. In: NIPS, pp. 2187–2195 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Kemal Ure.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ure, N.K., Chowdhary, G., Chen, Y.F. et al. Distributed Learning for Planning Under Uncertainty Problems with Heterogeneous Teams. J Intell Robot Syst 74, 529–544 (2014). https://doi.org/10.1007/s10846-013-9980-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-013-9980-x

Keywords

Navigation