Skip to main content
Log in

Using evolution strategies to solve DEC-POMDP problems

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Decentralized partially observable Markov decision process (DEC-POMDP) is an approach to model multi-robot decision making problems under uncertainty. Since it is NEXP-complete there is no efficient exact algorithm to solve these problems and in spite of the attention it has taken recently, so far only a few approximate solutions that can solve small problems have been proposed. In this study, we offer a novel approximate solution algorithm for DEC-POMDP problems using evolution strategies, and a novel approach to approximately calculate the fitness of the chromosomes which correspond to the expected reward. We also propose a new problem which is a more complex, modified version of the grid meeting problem and solve it. Our results show that our algorithm is scalable and we can solve problems that have more states than the problems attempted in previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Akin HL (1994) Evolutionary computation: a natural answer to artificial questions. In: Proceedings of ANNAL: hints from life to artificial intelligence, METU, pp 41-52

  • Amato C, Bernstein DS, Zilberstein S (2006) Optimal fixed-size controllers for decentralized POMDPs, AAMAS 2006 workshop on multi-agent sequential decision making in uncertain domains, Hakodate, Japan

  • Back T, Schwefel HP (1993) An overview of evolutionary algorithms for parameter optimization. Evol Comput 1:1–24

    Article  Google Scholar 

  • Becker E, Zilberstein S, Lesser V, Goldman CV (2004) Solving transition independent decentralized Markov decision processes. J Artif Intell Res 22:423–255

    MATH  MathSciNet  Google Scholar 

  • Bernstein D, Zilberstein S, Immerman N (2000) The complexity of decentralized control of markov decision processes. In: Proceedings of the 16th conference on uncertainty in artificial intelligence

  • Bernstein D, Hansen EA, Zilberstein S (2005) Bounded policy iteration for decentralized POMDPs. In: Proceedings of the nineteenth international joint conference on artificial intelligence (IJCAI), Edinburgh, Scotland

  • Cassandra AR (1998) A survey of POMDP applications, AAAI fall symposium

  • Cogill R, Rotkowitz M, Van Roy B, Lall S (2004) An approximate dynamic programming approach to decentralized control of stochastic systems. In: Proceedings of the allerton conference on communication, control, and computing

  • Cohen PR (1995) Empirical methods for artificial intelligence. MIT Press, Cambridge

  • Fogel IJ, Owens AJ, Walsh MJ (1966) Artificial intelligence through simulated evolution. Wiley, New York

    MATH  Google Scholar 

  • Goldman CV, Zilberstein S (2004) Decentralized control of cooperative systems: categorization and complexity analysis. J Art Intell Res 22:143–174

    MATH  MathSciNet  Google Scholar 

  • Hansen EA, Bernstein DS, Zilberstein S (2004) Dynamic programming for partially observable stochastic games. In: Proceedings of the nineteenth national conference on artificial intelligence (AAAI), San Jose, CA, pp 709-715

  • Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press

  • Ignat DB (1998) Genetic algorithm with punctuated equilibria: analysis of the traveling salesperson problem instance. A thesis in TCC402. School of Engineering and Applied Science University of Virginia

  • Jimenez F, Sanchez G, Vasant P, Verdegay J (2006) A multi-objective evolutionary approach for fuzzy optimization in production planning. 2006 IEEE international conference on systems, man, and cybernetics, Taipei, Taiwan, pp 3120–3125

  • Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput 9(1):3–12

    Article  Google Scholar 

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge. ISBN 0-262-11170-5

  • LaValle SM (2006) Planning algorithms. Cambridge University Press, Cambridge

  • Lin AZ, Bean JC, White CC (1999) A hybrid genetic/optimization algorithm for finite horizon partially observed Markov decision processes. In: Proceedings of the congress on evolutionary computation

  • Lopez E, Barea R, Bergasa LM, Escudero M (2003) Visually augmented POMDP for indoor robot navigation. In: 21th IASTED international multi-conference on applied informatics

  • Nair R, Pynadath D, Yokoo M, Tambe M, Marsella S (2003) Taming decentralized POMDPs: towards efficient policy computation for multiagent settings. In: Proceedings of the eighteenth international joint conference on artificial intelligence (IJCAI-03)

  • Pynadath DV, Tambe M (2002) The communicative multiagent team decision problem: analyzing teamwork theories and models. J Art Intell Res

  • Rabinovich Z, Goldman CV, Rosenschein JS (2003) The complexity of multiagent systems: the price of silence, In: Proceedings of the second international joint conference on autonomous agents and multiagent systems (AAMAS), Melbourne, Australia, pp 1102–1103

  • Rechenberg I (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog Verlag, Stuttgart

    Google Scholar 

  • Sanchez G, Jimenez F, Vasant P (2007) Fuzzy optimization with multi-objective evolutionary algorithms: a case study. In: Proceedings of the 2007 IEEE symposium on computational intelligence in multicriteria decision making (MCDM). Honolulu, Hawaii, USA, pp 58–64

  • Sendhoff B, Kreutz M, von Seelen W (1997) Causality and the analysis of local search in evolutionary algorithms. Internal report IRINI 97-16, Institut für Neuroinformatik, Ruhr-Universität Bochum, Germany

  • Seuken S, Zilberstein S (2005) Formal models and algorithms for decentralized control of multiple agents. Technical report UM-CS-2005-068, Computer Science Department, University of Massachusetts

  • Seuken S, Zilberstein S (2007) Memory-bounded dynamic programming for DEC-POMDPs, IJCAI

  • Spaan M, Vlassis N (2004) A point-based POMDP algorithm for robot planning. In: Proceedings of the IEEE international conference on robotics and automation. New Orleans, Louisiana, pp 2399–2404

  • Szer D, Charpillet F, Zilberstein S (2005) MAA*: a heuristic search algorithm for solving decentralized POMDPs. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence (UAI), Edinburgh, Scotland

  • Wang FK (2001) Confidence interval for the mean of non-normal data. Qual Reliab Eng Int 17:257–267

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barış Eker.

Additional information

This work is supported by Boğaziçi University Scientific Research Projects Grant 06HA102 and by TUBITAK with Project Grant 106E172.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eker, B., Akın, H.L. Using evolution strategies to solve DEC-POMDP problems. Soft Comput 14, 35–47 (2010). https://doi.org/10.1007/s00500-008-0388-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-008-0388-7

Keywords

Navigation