Skip to main content
Log in

Dynamic distributed constraint optimization using multi-agent reinforcement learning

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

An inherent difficulty in dynamic distributed constraint optimization problems (dynamic DCOP) is the uncertainty of future events when making an assignment at the current time. This dependency is not well addressed in the research community. This paper proposes a reinforcement-learning-based solver for dynamic distributed constraint optimization. We show that reinforcement learning techniques are an alternative approach to solve the given problem over time and are computationally more efficient than sequential DCOP solvers. We also use the novel heuristic to obtain the correct results and describe a formalism that has been adopted to model dynamic DCOPs with cooperative agents. We evaluate this approach in dynamic weapon target assignment (dynamic WTA) problem, via experimental results. We observe that the system dynamic WTA problem remains a safe zone after convergence while satisfying the constraints. Moreover, in the experiment we have implemented the agents that finally converge to the correct assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

All data generated or analyzed during this study are included in this paper.

References

  • Ahner DK, Parson CR (2015) Optimal multi-stage allocation of weapons to targets using adaptive dynamic programming. Optim Lett 9(8):1689–1701

    Article  MathSciNet  Google Scholar 

  • Amato C, Chowdhary G, Geramifard A, Üre NK, Kochenderfer MJ (2013) Decentralized control of partially observable Markov decision processes. In52nd IEEE conference on decision and control, IEEE 2013. Pp. 2398–240

  • Ballantine JP, Jerbert AR (1952) Distance from a line, or plane, to a poin. The Am Math Monthly 59(4):242–243

    MathSciNet  Google Scholar 

  • Becker R, Zilberstein S, Lesser V, Goldman CV (2004) Solving transition independent decentralized Markov decision processes. J Artif Intell Res 1(22):423–455

    Article  MathSciNet  Google Scholar 

  • Bellman R (2013) Dynamic programming. Courier Corporation

  • Bernstein D, Given R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision process. Math Oper Res 27(4):819–840

    Article  MathSciNet  Google Scholar 

  • Bertsekas DP, Homer ML, Logan DA, Patek SD, Sandell NR (2000) Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Trans Syst Man Cybern Part a Syst Hum 30(1):42–51

    Article  Google Scholar 

  • Blodgett DE, Gendreau M, Guertin F, Potvin JY, Séguin R (2003) A tabu search heuristic for resource management in naval warfare. J Heuristics 9(2):145–169

    Article  Google Scholar 

  • Boutilier C (1999) Sequential optimality and coordination in multiagent systems. In: IJCAI vol. 99: pp. 478–485

  • Cares J (2006) Distributed networked operations: the foundations of network centric warfare. IUniverse

  • Chen J, Yang J, Ye G (2015) Auction algorithm approaches for dynamic weapon target assignment problem. In: Computer science and network technology (ICCSNT), 4th international conference. IEEE vol. 1: pp. 402–405

  • Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752):2

    Google Scholar 

  • Davis MT, Robbins MJ, Lunday BJ (2016) Approximate dynamic programming for missile defense interceptor fire control. Eur J Oper Res 259(3):873–886

    Article  MathSciNet  Google Scholar 

  • denBroeder Jr GG, Ellison RE, Emerling L (1959) On optimum target assignments. Oper Res 7(3):322–326

    Article  MathSciNet  Google Scholar 

  • Eckler AR, Burr SA (1972) Mathematical models of target coverage and missile allocation. MILITARY OPERATIONS RESEARCH SOCIETY ALEXANDRIA VA

  • Farinelli A, Rogers A, Petcu A, Jennings N (2008) Decentralised coordination of low-power embedded devices using the MaxSum algorithm. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 639–646

  • Fioretto F, Pontelli E, Yeoh W (2018) Distributed constraint optimization problems and applications: a survey. J Artif Intell Res 61:623–698

    Article  MathSciNet  Google Scholar 

  • Fioretto F, Yeoh W, Pontelli E, Ma Y, Ranade S (2017) DCOP approach to the economic dispatch with demand response. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) pp. 999–1007

  • Ghanbari, AA et al. (2021) A survey on weapon target allocation models and applications. In: Computational optimization techniques and applications. IntechOpen

  • Hammond L (2016) Application of a dynamic programming algorithm for weapon target assignment. Defence Science and Technology Group Fishermans Bend Victoria Australia

  • Hoang KD, Fioretto F, Hou P, Yokoo M, Yeoh W, Zivan R (2016) Proactive dynamic distributed constraint optimization. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 597–605

  • Hoang KD, Hou P, Fioretto F, Yeoh W, Zivan R, Yokoo M (2017) Infinite-horizon proactive dynamic DCOPs. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 212–220

  • Hosein PA (1989) A class of dynamic nonlinear resource allocation problems. Massachusetts Inst Of Tech Cambridge Lab For Information And Decision Systems

  • Hosein PA, Walton JT, Athans M (1988) Dynamic weapon-target assignment problems with vulnerable C2Ì3 nodes

  • Jinjun L, Rong C, Jiguangt X (2006) Dynamic WTA optimization model of air defense operation of warships’ formation. J Syst Eng Electron 17(1):126–131

    Article  Google Scholar 

  • Karasakal O, Özdemirel NE, Kandiller L (2011) Anti-ship missile defense for a naval task group. Naval Res Logist (NRL) 58(3):304–321

    Article  MathSciNet  Google Scholar 

  • Kinoshita K, Iizuka K, Iizuka Y (2013) Effective disaster evacuation by solving the distributed constraint optimization problem. In: Proceedings of the international conference on advanced applied informatics (IIAIAAI). pp. 399–400

  • L´eaut´e T, Faltings B (2011) Coordinating logistics operations with privacy guarantees. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 2482–2487

  • Leboucher C, Shin HS, Siarry P, Chelouah R, Le Ménec S, Tsourdos A (2013) A two-step optimisation method for dynamic weapon target assignment problem. In: Recent advances on meta-heuristics and their application to real scenarios InTech

  • Leboucher C, Le Menec S, Kotenkoff A, Shin HS, Tsourdos A (2013) Optimal weapon target assignment based on an geometric approach. In: 19th IFAC symposium on automatic control in aerospace. Vol. 19: pp. 341–346

  • Leboucher C, Shin HS, Le Ménec S, Tsourdos A, Kotenkoff A, Siarry P, Chelouah R (2014) Novel evolutionary game based multi-objective optimisation for dynamic weapon target assignment. In: IFAC Proceedings p. 47(3)

  • Littman ML (2009) A tutorial on partially observable Markov decision processes. J Math Psychol 53(3):119–125

    Article  MathSciNet  Google Scholar 

  • Lloyd S, Witsenhausen H (1986) Weapons allocation is NP-complete. IEEE Summer Simulation Conference

  • Maheswaran R, Pearce J, Tambe M (2004) Distributed algorithms for DCOP: A graphical game-based approach. In: Proceedingsof the international conference on parallel and distributed computing systems (PDCS). pp. 432–439

  • Matlin S (1970) A review of the literature on the missile-allocation problem. Oper Res 18(2):334–373

    Article  Google Scholar 

  • Mei Z, Peng Z, Zhang X (2017) Optimal dynamic weapon-target assignment based on receding horizon control heuristic. In: Control & automation (ICCA), 2017 13th IEEE international conference IEEE pp. 876–88

  • Mirjalili S (2019) Genetic algorithm. In: Mirjalili S (ed) Evolutionary algorithms and neural networks. Springer, Cham

    Chapter  Google Scholar 

  • Moccia L, Cordeau JF, Monaco MF, Sammarra M (2007) Formulations and solution algorithms for a Dynamic Generalized Assignment Problem. CIRRELT

  • Modi P, Shen WM, Tambe M, Yokoo M (2005) ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artif Intell 161(1–2):149–180

    Article  MathSciNet  Google Scholar 

  • Morales DR, Romeijn HE (2004) The generalized assignment problem and extensions. In: Du D-Z, Pardalos PM (eds) Handbook of combinatorial optimization. Springer, Boston

    Google Scholar 

  • Murphey RA (2014) Target-based weapon target assignment problems. In: Pardalos PM, Pitsoulis LS (eds) Nonlinear assignment problems. Springer, Boston

    Google Scholar 

  • Naeem H, Masood A, Hussain M, Khan SA (2009) A novel two-staged decision support based threat evaluation and weapon assignment algorithm, asset-based dynamic weapon scheduling using artificial intelligence techniques. arXiv:0907.0067.

  • Nguyen DT, Yeoh W, Lau HC, Zilberstein S, Zhang C (2014) Decentralized multi-agent reinforcement learning average reward dynamic dcop. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 1341–1342

  • Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning. Springer, pp 3–42

    Chapter  Google Scholar 

  • Petcu A, Faltings B (2005) A scalable method for multiagent constraint optimization. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 1413–1420

  • Petcu A, Faltings B (2008) Optimal solution stability in dynamic, distributed constraint optimization. In: Proceedings of the international conference on intelligent agent technology (IAT) pp. 321–327. IEEE/WIC/ACM

  • Petcu A, Faltings B (2005) Superstabilizing, fault-containing distributed combinatorial optimization. In: Proceedings of the AAAI conference on artificial intelligence (AAAI) pp. 449–454.

  • Proper S, Tadepalli P (2009) Solving multiagent assignment markov decision processes. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1. pp. 681–688.

  • Ramchurn SD, Farinelli A, Macarthur KS, Jennings NR (2010) Decentralized coordination in robocup rescue. Comput J 53(9):1447–1461

    Article  Google Scholar 

  • Roux JN, Van Vuuren JH (2007) Threat evaluation and weapon assignment decision support: a review of the state of the art. ORiON 23(2):151–187

    Article  Google Scholar 

  • Scerri P, Farinelli A, Okamoto S, Tambe M (2005) Allocating tasks in extreme teams. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, ACM. Pp. 727–734

  • Semnani SH, Basir OA (2013) Target to sensor allocation: a hierarchical dynamic distributed constraint optimization approach. Comput Commun 36(9):1024–1038

    Article  Google Scholar 

  • Sikanen T (2008) Solving weapon target assignment problem with dynamic programming. Indep Res Proj Appl Mathe 17:32

    Google Scholar 

  • Silav, A., Karasakal, E., & Karasakal, O. (2021). Bi-objective dynamic weapon-target assignment problem with stability measure. Annals Oper Res 1–19

  • Spaan MT (2012) Partially observable Markov decision processes. In: M Wiering, M Otterlo van (eds) Reinforcement Learning, Springer, Berlin

  • Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robot 8(3):345–383

    Article  Google Scholar 

  • Sultanik E, Lass RN, Regli WC (2009) Dynamic configuration of agent organizations. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 305–311

  • Sutton RS, Barto AG (1999) Reinforcement learning. J Cognit Neurosci 11(1):126–134

    Article  Google Scholar 

  • Wang Y, Li J, Huang W, Wen T (2017) Dynamic weapon target assignment based on intuitionistic fuzzy entropy of discrete particle swarm. China Commun 14(1):169–179

    Article  Google Scholar 

  • Wu L, Wang HY, Lu FX, Jia P (208) An anytime algorithm based on modified GA for dynamic weapon target allocation problem. In: Evolutionary computation, CEC 2008. (IEEE World Congress on Computational Intelligence)

  • Xin B, Chen J, Zhang J, Dou L, Peng Z (2010) Efficient decision makings for dynamic weapon-target assignment by virtual permutation and tabu search heuristics. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):649–662

    Article  Google Scholar 

  • Xin B, Chen J, Peng Z, Dou L, Zhang J (2011) An efficient rule-based constructive heuristic to solve dynamic weapon-target assignment problem. IEEE Trans Syst Man Cybern Part a Syst Humans 41(3):598–606

    Article  Google Scholar 

  • Yagiura M, Ibaraki T (1989) The generalized assignment problem and its generalizations. St. Marys College of Maryland, St. Marys City, MD, USA, Tech. Rep.[Online]. Available: http://faculty. smcm. edu/acjamieson/f12/GAP. pdf

  • Yeoh W, Varakantham P, Sun X, Koenig S (2015) Incremental DCOP search algorithms for solving dynamic DCOPs. In: Proceedings of IAT. pp. 257–264.

  • Zhang Y, Yang RN, Zuo JL, Jing X (2015) Improved MOEA/D for dynamic weapon-target assignment problem. J Harb Instit Technol 22(6):121–128

    Google Scholar 

  • Zivan R, Glinton R, Sycara K (2009) Distributed constraint optimization for large teams of mobile sensing agents. In: Proceeding of the international conference on intelligent agent technology (IAT) pp. 347–354

Download references

Funding

The authors declare that no funds, grants or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohsen Afsharchi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors declare that no specific data set has been used.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shokoohi, M., Afsharchi, M. & Shah-Hoseini, H. Dynamic distributed constraint optimization using multi-agent reinforcement learning. Soft Comput 26, 3601–3629 (2022). https://doi.org/10.1007/s00500-022-06820-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-06820-7

Keywords