Skip to main content

Advertisement

Log in

Solving decentralized POMDP problems using genetic algorithms

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The Decentralized Partially Observable Markov Decision Process (DEC-POMDP) model addresses the multiagent planning problem in partially observable environments. Due to its high computational complexity, in general only very small size problems can be solved exactly and most researchers concentrate on approximate solution algorithms to handle more complex cases. However, many approximate solution techniques can handle large size problems only for small horizons due to their exponential memory requirements for representing the policies and searching the policy space. In this study, we offer an approximate solution algorithm called GA-FSC that uses finite state controllers (FSC) to represent a finite-horizon DEC-POMDP policy and searches the policy space using genetic algorithms. We encode FSCs into chromosomes and we use one exact and one approximate technique to calculate the fitness of the chromosomes. The exact calculation technique helps us to obtain better quality solutions with the cost of more processing time compared to the approximate fitness calculation. Our method is able to replicate the best results reported so far in the literature in most cases and it is also able to extend the reported horizons further in almost all cases when compared to optimal approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akın, H.L. (1994). Evolutionary computation: A natural answer to artificial questions. In: ANNAL, pp. 41–52.

  2. Amato, C., Bernstein, D.S., Zilberstein, S. (2006). Optimal fixed-size controllers for decentralized POMDPs. In: AAMAS 2006 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains.

  3. Amato, C., Bernstein, D.S., Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDPs. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), pp. 1–8.

  4. Amato C., Bernstein D. S., Zilberstein S. (2010) Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems 21: 293–320

    Article  Google Scholar 

  5. Amato, C., Dibangoye, J.S., Zilberstein, S. (2009). Incremental policy generation for finite-horizon DEC-POMDPs. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 2–9.

  6. Amato, C., Zilberstein, S. (2008). Heuristic policy iteration for infinite-horizon decentralized POMDPs. In: Proceedings of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM).

  7. Aras R., Dutech A. (2010) An investigation into mathematical programming for finite horizon decentralized POMDPs. The Journal of Artificial Intelligence Research 37: 329–396

    MathSciNet  MATH  Google Scholar 

  8. Astrom K. (1965) Optimal control of Markov decision process with incomplete state estimation. Journal of Mathematical Analysis and Applications 10: 174–205

    Article  MathSciNet  Google Scholar 

  9. Becker R., Zilberstein S., Goldman C. V. (2004) Solving transition independent decentralized markov decision processes. Journal of Artificial Intelligence Research 22: 423–455

    MathSciNet  MATH  Google Scholar 

  10. Bernstein D. S., Amato C., Hansen E. A., Zilberstein S. (2009) Policy iteration for decentralized control of markov decision processes. Journal of Articial Intelligence Research 34: 89–132

    MathSciNet  MATH  Google Scholar 

  11. Bernstein, D. S., Hansen, E., Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1287–1292). Scotland: Edinburgh.

  12. Bernstein, D. S., Zilberstein, S., Immerman, N. (2000). The complexity of decentralized control of markov decision processes. In: Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI).

  13. Besse, C., Chaib-draa, B. (2010). Quasi-deterministic POMDPs and Dec-POMDPs. In: In Proceedings of 5th International Workshop On Multiagent Sequential Decision Making in Uncertain Domains.

  14. Boger J., Hoey J., Poupart P., Boutilier C., Fernie G., Mihailidis A. (2006) A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Transactions on Information Technology and Biomedicine 10: 323–333

    Article  Google Scholar 

  15. Boularias, A., Chaib-draa, B. (2008). Exact dynamic programming for decentralized POMDPs with lossless policy compression. In: International Conference on Automated Planning and Scheduling.

  16. Boularias, A., Chaib-draa, B. (2008). Planning in decentralized POMDPs with predictive policy representations. In: ICAPS’08 Multiagent Planning Workshop (MASPLAN’08).

  17. Carlin, A., Zilberstein, S. (2008). Value-based observation compression for DEC-POMDPs. In: In Proceedings of the 7th International Joint Conference on Autonomous agents and multiagent systems, pp. 501–508.

  18. Cohen P. R. (1995) Empirical methods for artificial intelligence. MIT Press, Massachusetts

    MATH  Google Scholar 

  19. Dibangoye, J. S., Chai-draa, B., Mouaddib, A. I. (2009). Policy iteration algorithms for DEC-POMDPs with discounted rewards. In: Eighth International AAMAS Workshop on MSDM.

  20. Dibangoye, J. S., Mouaddib, A., Chai-draa, B. (2009). Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs. In: In Proceedings of the Eighth International Joint Conference on Autonomous Agents andMultiagent Systems, pp. 569–576.

  21. Eker B., Akın H. (2010) Using evolution strategies to solve DEC-POMDP problems. Soft Computing-A Fusion of Foundations, Methodologies and Applications 14(1): 35–47

    Google Scholar 

  22. Fogel I., Owens A. J., Walsh M. (1966) Artificial intelligence through simulated evolution. Wiley, New York

    MATH  Google Scholar 

  23. Hansen, E. A., Bernstein, D. S., Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In: 19th National Conference on Artificial Intelligence (AAAI-04), San Jose, CA (pp. 709–715).

  24. Holland J. (1975) Adaptation in natural, artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  25. Jimenez, F., Sanchez, G., Vasant, P., Vergeday, J. L. (2006). A multi-objective evolutionary approach for fuzzy optimization in production planning. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3120–3125.

  26. Kaelbling L. P., Littman M. L., Cassandra A. R. (1995) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101: 99–134

    Article  MathSciNet  Google Scholar 

  27. Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press. ISBN 0-262-11170-5.

  28. Kube C. R., Zhang H. (1997) Task modelling in collective robotics. Autonomous Robots 4: 53–72

    Article  Google Scholar 

  29. Kumar, A., Zilberstein, S. (2009). Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, Vol. 1, AAMAS ’09, pp. 561–568.

  30. Kumar, A., Zilberstein, S. (2010). Point-based backup for decentralized POMDPs: Complexity and new algorithms. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, Toronto, Canada, pp. 1315–1322.

  31. LaValle S. (2006) Planning algorithms. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  32. Lin, A. Z. Z., Bean, J. C., White, C. C. (1998). Genetic algorithm heuristics for finite horizon partially observed markov decision problems. Technical Report, University of Michigan, Ann Arbor.

  33. Lopez, E., Barea, R., Bergasa, L., Escudero, M. (2003). Visually augmented POMDP for indoor robot navigation. In: 21th IASTED International Multi-Conference on Applied Informatics.

  34. Mazurowski, M., Zurada, J. (2007). Solving decentralized multi-agent control problems with genetic algorithms. In: IEEE Congress on Evolutionary Computation, pp. 1029–1034.

  35. Nair, R., Tambe, M., Yokoo, M., Pynadath, D., Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In: Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 705–711.

  36. Nair, R., Varakantham, P., Tambe, M., Yokoo, M. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 133–139.

  37. Oliehoek, F. A. (2012). Decentralized POMDPs. In: M. Wiering, & M. V. Otterlo (Eds.), Reinforcement learning: State of the art. Berlin, Germany: Springer

  38. Oliehoek, F. A., Kooij, J. F. P., Vlassis, N. (2007). A cross-entropy approach to solving DEC-POMDPs. In: 1st International Symposium on Intelligent and Distributed Computing.

  39. Oliehoek F. A., Kooij J. F. P., Vlassis N. (2008) The cross-entropy method for policy search in decentralized POMDPs. Informatica 32: 341–357

    MathSciNet  MATH  Google Scholar 

  40. Oliehoek F. A., Spaan M. T. J., Vlassis N. (2008) Optimal and approximate q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32: 289–353

    MathSciNet  MATH  Google Scholar 

  41. Oliehoek, F. A., Vlassis, N. (2007). Q-value functions for decentralized POMDPs. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 2007, pp. 833–840.

  42. Oliehoek, F. A., Whiteson, S., Spaan, M. T. J. (2009). Lossless clustering of histories in decentralized POMDPs. In: Proceedings of the 8th International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 577–584.

  43. Papadimitriou C. H., Tsitsiklis J. N. (1987) The complexity of markov decision processes. Mathematics of Operations Research 12: 441–450

    Article  MathSciNet  MATH  Google Scholar 

  44. Poupart, P., Boutilier, C. (2003). Bounded finite state controllers. In Proceedings of Advances in Neural Information Processing Systems, Vol. 16.

  45. Pynadath D., Tambe M. (2002) The communicative multiagent team decision problem: Analyzing teamwork theories, models. Journal of Artificial Intelligence Research 16: 389–423

    MathSciNet  MATH  Google Scholar 

  46. Rabinovich, Z., Goldman, C. V., Rosenschein, J. S. (2003). The complexity of multiagent systems: The price of silence. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems(AAMAS), pp. 1102–1103.

  47. Rechenberg I. (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog Verlag, Stuttgart

    Google Scholar 

  48. Regan, K., Cohen, R., Poupart, P. (2005). The advisor-POMDP: A principled approach to trust through reputation in electronic markets. In: In Proceedings of the Third Annual Conference on Privacy, Security and Trust (PST), pp. 121–130.

  49. Sanchez, G., Jimenez, F., Vasant, P. (2007). Fuzzy optimization with multi-objective evolutionary algorithms: a case study. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM), pp. 58–64.

  50. Seuken, S., Zilberstein, S. (2007). Improved memory bounded dynamic programming for decentralized POMDPs. In: Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), Vancouver, BC, Canada (pp. 344–351).

  51. Seuken, S., Zilberstein, S. (2007). Memory bounded dynamic programming for decentralized POMDPs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2009–2015.

  52. Spaan, M. T. J. (2011). DEC-POMDP. http://users.isr.ist.utl.pt/~mtjspaan/decpomdp/.

  53. Spaan, M. T. J., Oliehoek, F. A., Amato, C. (2011). Scaling up optimal heuristic search in DEC-POMDPs via incremental expansion. In: Proceedings of International Joint Conference on Artificial Intelligence.

  54. Spaan, M. T. J., Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2399–2404.

  55. Szer, D., Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In: ECML, pp. 389–399.

  56. Szer, D., Charpillet, F. (2006). Point-based dynamic programming for DEC-POMDPs. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1233–1238.

  57. Szer, D., Charpillet, F., Zilberstein, S. (2005). Maa*: A heuristic search algorithm for solving decentralized POMDPs. In: Twenty-First Conference on Uncertainty in Artificial Intelligence(UAI). Scotland: Edinburgh.

  58. Wang F. (2001) Confidence interval for the mean of non-normal data. Quality, Reliability Engineering International 17: 57–267

    Google Scholar 

  59. Williams, J. D., Poupart, P., Young, S. (2005). Partially observable markov decision processes with continuous observations for dialogue management. In: Proceedings of the 6th SigDial Workshop on Discourse and Dialogue.

  60. Wu, F., Chen, X. (2008). Solving large-scale and sparse-reward DEC-POMDPs with correlation-mdps. In: RoboCup 2007: Robot Soccer World Cup XI, pp. 208–219.

  61. Wu, F., Zilberstein, S., Chen, X. (2009). Multi-agent online planning with communication. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 321–329. Thessaloniki, Greece.

  62. Wu, F., Zilberstein, S., Chen, X. (2010). Point-based policy generation for decentralized POMDPs. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pp. 1307–1314.

  63. Wu F., Zilberstein S., Chen X. (2011) Online planning for multi-agent systems with bounded communication. Artificial Intelligence 175: 487–511

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barış Eker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eker, B., Akın, H.L. Solving decentralized POMDP problems using genetic algorithms. Auton Agent Multi-Agent Syst 27, 161–196 (2013). https://doi.org/10.1007/s10458-012-9204-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-012-9204-y

Keywords

Navigation