Solving decentralized POMDP problems using genetic algorithms

Eker, Barış; Akın, H. Levent

doi:10.1007/s10458-012-9204-y

Solving decentralized POMDP problems using genetic algorithms

Published: 19 July 2012

Volume 27, pages 161–196, (2013)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Barış Eker¹ &
H. Levent Akın¹

652 Accesses
11 Citations
Explore all metrics

Abstract

The Decentralized Partially Observable Markov Decision Process (DEC-POMDP) model addresses the multiagent planning problem in partially observable environments. Due to its high computational complexity, in general only very small size problems can be solved exactly and most researchers concentrate on approximate solution algorithms to handle more complex cases. However, many approximate solution techniques can handle large size problems only for small horizons due to their exponential memory requirements for representing the policies and searching the policy space. In this study, we offer an approximate solution algorithm called GA-FSC that uses finite state controllers (FSC) to represent a finite-horizon DEC-POMDP policy and searches the policy space using genetic algorithms. We encode FSCs into chromosomes and we use one exact and one approximate technique to calculate the fitness of the chromosomes. The exact calculation technique helps us to obtain better quality solutions with the cost of more processing time compared to the approximate fitness calculation. Our method is able to replicate the best results reported so far in the literature in most cases and it is also able to extend the reported horizons further in almost all cases when compared to optimal approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akın, H.L. (1994). Evolutionary computation: A natural answer to artificial questions. In: ANNAL, pp. 41–52.
Amato, C., Bernstein, D.S., Zilberstein, S. (2006). Optimal fixed-size controllers for decentralized POMDPs. In: AAMAS 2006 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains.
Amato, C., Bernstein, D.S., Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDPs. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), pp. 1–8.
Amato C., Bernstein D. S., Zilberstein S. (2010) Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems 21: 293–320
Article Google Scholar
Amato, C., Dibangoye, J.S., Zilberstein, S. (2009). Incremental policy generation for finite-horizon DEC-POMDPs. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 2–9.
Amato, C., Zilberstein, S. (2008). Heuristic policy iteration for infinite-horizon decentralized POMDPs. In: Proceedings of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM).
Aras R., Dutech A. (2010) An investigation into mathematical programming for finite horizon decentralized POMDPs. The Journal of Artificial Intelligence Research 37: 329–396
MathSciNet MATH Google Scholar
Astrom K. (1965) Optimal control of Markov decision process with incomplete state estimation. Journal of Mathematical Analysis and Applications 10: 174–205
Article MathSciNet Google Scholar
Becker R., Zilberstein S., Goldman C. V. (2004) Solving transition independent decentralized markov decision processes. Journal of Artificial Intelligence Research 22: 423–455
MathSciNet MATH Google Scholar
Bernstein D. S., Amato C., Hansen E. A., Zilberstein S. (2009) Policy iteration for decentralized control of markov decision processes. Journal of Articial Intelligence Research 34: 89–132
MathSciNet MATH Google Scholar
Bernstein, D. S., Hansen, E., Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1287–1292). Scotland: Edinburgh.
Bernstein, D. S., Zilberstein, S., Immerman, N. (2000). The complexity of decentralized control of markov decision processes. In: Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI).
Besse, C., Chaib-draa, B. (2010). Quasi-deterministic POMDPs and Dec-POMDPs. In: In Proceedings of 5th International Workshop On Multiagent Sequential Decision Making in Uncertain Domains.
Boger J., Hoey J., Poupart P., Boutilier C., Fernie G., Mihailidis A. (2006) A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Transactions on Information Technology and Biomedicine 10: 323–333
Article Google Scholar
Boularias, A., Chaib-draa, B. (2008). Exact dynamic programming for decentralized POMDPs with lossless policy compression. In: International Conference on Automated Planning and Scheduling.
Boularias, A., Chaib-draa, B. (2008). Planning in decentralized POMDPs with predictive policy representations. In: ICAPS’08 Multiagent Planning Workshop (MASPLAN’08).
Carlin, A., Zilberstein, S. (2008). Value-based observation compression for DEC-POMDPs. In: In Proceedings of the 7th International Joint Conference on Autonomous agents and multiagent systems, pp. 501–508.
Cohen P. R. (1995) Empirical methods for artificial intelligence. MIT Press, Massachusetts
MATH Google Scholar
Dibangoye, J. S., Chai-draa, B., Mouaddib, A. I. (2009). Policy iteration algorithms for DEC-POMDPs with discounted rewards. In: Eighth International AAMAS Workshop on MSDM.
Dibangoye, J. S., Mouaddib, A., Chai-draa, B. (2009). Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs. In: In Proceedings of the Eighth International Joint Conference on Autonomous Agents andMultiagent Systems, pp. 569–576.
Eker B., Akın H. (2010) Using evolution strategies to solve DEC-POMDP problems. Soft Computing-A Fusion of Foundations, Methodologies and Applications 14(1): 35–47
Google Scholar
Fogel I., Owens A. J., Walsh M. (1966) Artificial intelligence through simulated evolution. Wiley, New York
MATH Google Scholar
Hansen, E. A., Bernstein, D. S., Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In: 19th National Conference on Artificial Intelligence (AAAI-04), San Jose, CA (pp. 709–715).
Holland J. (1975) Adaptation in natural, artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Jimenez, F., Sanchez, G., Vasant, P., Vergeday, J. L. (2006). A multi-objective evolutionary approach for fuzzy optimization in production planning. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3120–3125.
Kaelbling L. P., Littman M. L., Cassandra A. R. (1995) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101: 99–134
Article MathSciNet Google Scholar
Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press. ISBN 0-262-11170-5.
Kube C. R., Zhang H. (1997) Task modelling in collective robotics. Autonomous Robots 4: 53–72
Article Google Scholar
Kumar, A., Zilberstein, S. (2009). Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, Vol. 1, AAMAS ’09, pp. 561–568.
Kumar, A., Zilberstein, S. (2010). Point-based backup for decentralized POMDPs: Complexity and new algorithms. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, Toronto, Canada, pp. 1315–1322.
LaValle S. (2006) Planning algorithms. Cambridge University Press, Cambridge
Book MATH Google Scholar
Lin, A. Z. Z., Bean, J. C., White, C. C. (1998). Genetic algorithm heuristics for finite horizon partially observed markov decision problems. Technical Report, University of Michigan, Ann Arbor.
Lopez, E., Barea, R., Bergasa, L., Escudero, M. (2003). Visually augmented POMDP for indoor robot navigation. In: 21th IASTED International Multi-Conference on Applied Informatics.
Mazurowski, M., Zurada, J. (2007). Solving decentralized multi-agent control problems with genetic algorithms. In: IEEE Congress on Evolutionary Computation, pp. 1029–1034.
Nair, R., Tambe, M., Yokoo, M., Pynadath, D., Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In: Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 705–711.
Nair, R., Varakantham, P., Tambe, M., Yokoo, M. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 133–139.
Oliehoek, F. A. (2012). Decentralized POMDPs. In: M. Wiering, & M. V. Otterlo (Eds.), Reinforcement learning: State of the art. Berlin, Germany: Springer
Oliehoek, F. A., Kooij, J. F. P., Vlassis, N. (2007). A cross-entropy approach to solving DEC-POMDPs. In: 1st International Symposium on Intelligent and Distributed Computing.
Oliehoek F. A., Kooij J. F. P., Vlassis N. (2008) The cross-entropy method for policy search in decentralized POMDPs. Informatica 32: 341–357
MathSciNet MATH Google Scholar
Oliehoek F. A., Spaan M. T. J., Vlassis N. (2008) Optimal and approximate q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32: 289–353
MathSciNet MATH Google Scholar
Oliehoek, F. A., Vlassis, N. (2007). Q-value functions for decentralized POMDPs. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 2007, pp. 833–840.
Oliehoek, F. A., Whiteson, S., Spaan, M. T. J. (2009). Lossless clustering of histories in decentralized POMDPs. In: Proceedings of the 8th International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 577–584.
Papadimitriou C. H., Tsitsiklis J. N. (1987) The complexity of markov decision processes. Mathematics of Operations Research 12: 441–450
Article MathSciNet MATH Google Scholar
Poupart, P., Boutilier, C. (2003). Bounded finite state controllers. In Proceedings of Advances in Neural Information Processing Systems, Vol. 16.
Pynadath D., Tambe M. (2002) The communicative multiagent team decision problem: Analyzing teamwork theories, models. Journal of Artificial Intelligence Research 16: 389–423
MathSciNet MATH Google Scholar
Rabinovich, Z., Goldman, C. V., Rosenschein, J. S. (2003). The complexity of multiagent systems: The price of silence. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems(AAMAS), pp. 1102–1103.
Rechenberg I. (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog Verlag, Stuttgart
Google Scholar
Regan, K., Cohen, R., Poupart, P. (2005). The advisor-POMDP: A principled approach to trust through reputation in electronic markets. In: In Proceedings of the Third Annual Conference on Privacy, Security and Trust (PST), pp. 121–130.
Sanchez, G., Jimenez, F., Vasant, P. (2007). Fuzzy optimization with multi-objective evolutionary algorithms: a case study. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM), pp. 58–64.
Seuken, S., Zilberstein, S. (2007). Improved memory bounded dynamic programming for decentralized POMDPs. In: Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), Vancouver, BC, Canada (pp. 344–351).
Seuken, S., Zilberstein, S. (2007). Memory bounded dynamic programming for decentralized POMDPs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2009–2015.
Spaan, M. T. J. (2011). DEC-POMDP. http://users.isr.ist.utl.pt/~mtjspaan/decpomdp/.
Spaan, M. T. J., Oliehoek, F. A., Amato, C. (2011). Scaling up optimal heuristic search in DEC-POMDPs via incremental expansion. In: Proceedings of International Joint Conference on Artificial Intelligence.
Spaan, M. T. J., Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2399–2404.
Szer, D., Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In: ECML, pp. 389–399.
Szer, D., Charpillet, F. (2006). Point-based dynamic programming for DEC-POMDPs. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1233–1238.
Szer, D., Charpillet, F., Zilberstein, S. (2005). Maa*: A heuristic search algorithm for solving decentralized POMDPs. In: Twenty-First Conference on Uncertainty in Artificial Intelligence(UAI). Scotland: Edinburgh.
Wang F. (2001) Confidence interval for the mean of non-normal data. Quality, Reliability Engineering International 17: 57–267
Google Scholar
Williams, J. D., Poupart, P., Young, S. (2005). Partially observable markov decision processes with continuous observations for dialogue management. In: Proceedings of the 6th SigDial Workshop on Discourse and Dialogue.
Wu, F., Chen, X. (2008). Solving large-scale and sparse-reward DEC-POMDPs with correlation-mdps. In: RoboCup 2007: Robot Soccer World Cup XI, pp. 208–219.
Wu, F., Zilberstein, S., Chen, X. (2009). Multi-agent online planning with communication. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 321–329. Thessaloniki, Greece.
Wu, F., Zilberstein, S., Chen, X. (2010). Point-based policy generation for decentralized POMDPs. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pp. 1307–1314.
Wu F., Zilberstein S., Chen X. (2011) Online planning for multi-agent systems with bounded communication. Artificial Intelligence 175: 487–511
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Barış Eker & H. Levent Akın

Authors

Barış Eker
View author publications
You can also search for this author in PubMed Google Scholar
H. Levent Akın
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Barış Eker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eker, B., Akın, H.L. Solving decentralized POMDP problems using genetic algorithms. Auton Agent Multi-Agent Syst 27, 161–196 (2013). https://doi.org/10.1007/s10458-012-9204-y

Download citation

Published: 19 July 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s10458-012-9204-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving decentralized POMDP problems using genetic algorithms

Abstract

Access this article

Similar content being viewed by others

Expectation-maximization for Bayes-adaptive POMDPs

Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs

A Dynamic and Task-Independent Reward Shaping Approach for Discrete Partially Observable Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Solving decentralized POMDP problems using genetic algorithms

Abstract

Access this article

Similar content being viewed by others

Expectation-maximization for Bayes-adaptive POMDPs

Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs

A Dynamic and Task-Independent Reward Shaping Approach for Discrete Partially Observable Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation