Abstract
An inherent difficulty in dynamic distributed constraint optimization problems (dynamic DCOP) is the uncertainty of future events when making an assignment at the current time. This dependency is not well addressed in the research community. This paper proposes a reinforcement-learning-based solver for dynamic distributed constraint optimization. We show that reinforcement learning techniques are an alternative approach to solve the given problem over time and are computationally more efficient than sequential DCOP solvers. We also use the novel heuristic to obtain the correct results and describe a formalism that has been adopted to model dynamic DCOPs with cooperative agents. We evaluate this approach in dynamic weapon target assignment (dynamic WTA) problem, via experimental results. We observe that the system dynamic WTA problem remains a safe zone after convergence while satisfying the constraints. Moreover, in the experiment we have implemented the agents that finally converge to the correct assignment.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
All data generated or analyzed during this study are included in this paper.
References
Ahner DK, Parson CR (2015) Optimal multi-stage allocation of weapons to targets using adaptive dynamic programming. Optim Lett 9(8):1689–1701
Amato C, Chowdhary G, Geramifard A, Üre NK, Kochenderfer MJ (2013) Decentralized control of partially observable Markov decision processes. In52nd IEEE conference on decision and control, IEEE 2013. Pp. 2398–240
Ballantine JP, Jerbert AR (1952) Distance from a line, or plane, to a poin. The Am Math Monthly 59(4):242–243
Becker R, Zilberstein S, Lesser V, Goldman CV (2004) Solving transition independent decentralized Markov decision processes. J Artif Intell Res 1(22):423–455
Bellman R (2013) Dynamic programming. Courier Corporation
Bernstein D, Given R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision process. Math Oper Res 27(4):819–840
Bertsekas DP, Homer ML, Logan DA, Patek SD, Sandell NR (2000) Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Trans Syst Man Cybern Part a Syst Hum 30(1):42–51
Blodgett DE, Gendreau M, Guertin F, Potvin JY, Séguin R (2003) A tabu search heuristic for resource management in naval warfare. J Heuristics 9(2):145–169
Boutilier C (1999) Sequential optimality and coordination in multiagent systems. In: IJCAI vol. 99: pp. 478–485
Cares J (2006) Distributed networked operations: the foundations of network centric warfare. IUniverse
Chen J, Yang J, Ye G (2015) Auction algorithm approaches for dynamic weapon target assignment problem. In: Computer science and network technology (ICCSNT), 4th international conference. IEEE vol. 1: pp. 402–405
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752):2
Davis MT, Robbins MJ, Lunday BJ (2016) Approximate dynamic programming for missile defense interceptor fire control. Eur J Oper Res 259(3):873–886
denBroeder Jr GG, Ellison RE, Emerling L (1959) On optimum target assignments. Oper Res 7(3):322–326
Eckler AR, Burr SA (1972) Mathematical models of target coverage and missile allocation. MILITARY OPERATIONS RESEARCH SOCIETY ALEXANDRIA VA
Farinelli A, Rogers A, Petcu A, Jennings N (2008) Decentralised coordination of low-power embedded devices using the MaxSum algorithm. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 639–646
Fioretto F, Pontelli E, Yeoh W (2018) Distributed constraint optimization problems and applications: a survey. J Artif Intell Res 61:623–698
Fioretto F, Yeoh W, Pontelli E, Ma Y, Ranade S (2017) DCOP approach to the economic dispatch with demand response. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) pp. 999–1007
Ghanbari, AA et al. (2021) A survey on weapon target allocation models and applications. In: Computational optimization techniques and applications. IntechOpen
Hammond L (2016) Application of a dynamic programming algorithm for weapon target assignment. Defence Science and Technology Group Fishermans Bend Victoria Australia
Hoang KD, Fioretto F, Hou P, Yokoo M, Yeoh W, Zivan R (2016) Proactive dynamic distributed constraint optimization. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 597–605
Hoang KD, Hou P, Fioretto F, Yeoh W, Zivan R, Yokoo M (2017) Infinite-horizon proactive dynamic DCOPs. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 212–220
Hosein PA (1989) A class of dynamic nonlinear resource allocation problems. Massachusetts Inst Of Tech Cambridge Lab For Information And Decision Systems
Hosein PA, Walton JT, Athans M (1988) Dynamic weapon-target assignment problems with vulnerable C2Ì3 nodes
Jinjun L, Rong C, Jiguangt X (2006) Dynamic WTA optimization model of air defense operation of warships’ formation. J Syst Eng Electron 17(1):126–131
Karasakal O, Özdemirel NE, Kandiller L (2011) Anti-ship missile defense for a naval task group. Naval Res Logist (NRL) 58(3):304–321
Kinoshita K, Iizuka K, Iizuka Y (2013) Effective disaster evacuation by solving the distributed constraint optimization problem. In: Proceedings of the international conference on advanced applied informatics (IIAIAAI). pp. 399–400
L´eaut´e T, Faltings B (2011) Coordinating logistics operations with privacy guarantees. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 2482–2487
Leboucher C, Shin HS, Siarry P, Chelouah R, Le Ménec S, Tsourdos A (2013) A two-step optimisation method for dynamic weapon target assignment problem. In: Recent advances on meta-heuristics and their application to real scenarios InTech
Leboucher C, Le Menec S, Kotenkoff A, Shin HS, Tsourdos A (2013) Optimal weapon target assignment based on an geometric approach. In: 19th IFAC symposium on automatic control in aerospace. Vol. 19: pp. 341–346
Leboucher C, Shin HS, Le Ménec S, Tsourdos A, Kotenkoff A, Siarry P, Chelouah R (2014) Novel evolutionary game based multi-objective optimisation for dynamic weapon target assignment. In: IFAC Proceedings p. 47(3)
Littman ML (2009) A tutorial on partially observable Markov decision processes. J Math Psychol 53(3):119–125
Lloyd S, Witsenhausen H (1986) Weapons allocation is NP-complete. IEEE Summer Simulation Conference
Maheswaran R, Pearce J, Tambe M (2004) Distributed algorithms for DCOP: A graphical game-based approach. In: Proceedingsof the international conference on parallel and distributed computing systems (PDCS). pp. 432–439
Matlin S (1970) A review of the literature on the missile-allocation problem. Oper Res 18(2):334–373
Mei Z, Peng Z, Zhang X (2017) Optimal dynamic weapon-target assignment based on receding horizon control heuristic. In: Control & automation (ICCA), 2017 13th IEEE international conference IEEE pp. 876–88
Mirjalili S (2019) Genetic algorithm. In: Mirjalili S (ed) Evolutionary algorithms and neural networks. Springer, Cham
Moccia L, Cordeau JF, Monaco MF, Sammarra M (2007) Formulations and solution algorithms for a Dynamic Generalized Assignment Problem. CIRRELT
Modi P, Shen WM, Tambe M, Yokoo M (2005) ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artif Intell 161(1–2):149–180
Morales DR, Romeijn HE (2004) The generalized assignment problem and extensions. In: Du D-Z, Pardalos PM (eds) Handbook of combinatorial optimization. Springer, Boston
Murphey RA (2014) Target-based weapon target assignment problems. In: Pardalos PM, Pitsoulis LS (eds) Nonlinear assignment problems. Springer, Boston
Naeem H, Masood A, Hussain M, Khan SA (2009) A novel two-staged decision support based threat evaluation and weapon assignment algorithm, asset-based dynamic weapon scheduling using artificial intelligence techniques. arXiv:0907.0067.
Nguyen DT, Yeoh W, Lau HC, Zilberstein S, Zhang C (2014) Decentralized multi-agent reinforcement learning average reward dynamic dcop. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 1341–1342
Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning. Springer, pp 3–42
Petcu A, Faltings B (2005) A scalable method for multiagent constraint optimization. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 1413–1420
Petcu A, Faltings B (2008) Optimal solution stability in dynamic, distributed constraint optimization. In: Proceedings of the international conference on intelligent agent technology (IAT) pp. 321–327. IEEE/WIC/ACM
Petcu A, Faltings B (2005) Superstabilizing, fault-containing distributed combinatorial optimization. In: Proceedings of the AAAI conference on artificial intelligence (AAAI) pp. 449–454.
Proper S, Tadepalli P (2009) Solving multiagent assignment markov decision processes. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1. pp. 681–688.
Ramchurn SD, Farinelli A, Macarthur KS, Jennings NR (2010) Decentralized coordination in robocup rescue. Comput J 53(9):1447–1461
Roux JN, Van Vuuren JH (2007) Threat evaluation and weapon assignment decision support: a review of the state of the art. ORiON 23(2):151–187
Scerri P, Farinelli A, Okamoto S, Tambe M (2005) Allocating tasks in extreme teams. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, ACM. Pp. 727–734
Semnani SH, Basir OA (2013) Target to sensor allocation: a hierarchical dynamic distributed constraint optimization approach. Comput Commun 36(9):1024–1038
Sikanen T (2008) Solving weapon target assignment problem with dynamic programming. Indep Res Proj Appl Mathe 17:32
Silav, A., Karasakal, E., & Karasakal, O. (2021). Bi-objective dynamic weapon-target assignment problem with stability measure. Annals Oper Res 1–19
Spaan MT (2012) Partially observable Markov decision processes. In: M Wiering, M Otterlo van (eds) Reinforcement Learning, Springer, Berlin
Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robot 8(3):345–383
Sultanik E, Lass RN, Regli WC (2009) Dynamic configuration of agent organizations. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 305–311
Sutton RS, Barto AG (1999) Reinforcement learning. J Cognit Neurosci 11(1):126–134
Wang Y, Li J, Huang W, Wen T (2017) Dynamic weapon target assignment based on intuitionistic fuzzy entropy of discrete particle swarm. China Commun 14(1):169–179
Wu L, Wang HY, Lu FX, Jia P (208) An anytime algorithm based on modified GA for dynamic weapon target allocation problem. In: Evolutionary computation, CEC 2008. (IEEE World Congress on Computational Intelligence)
Xin B, Chen J, Zhang J, Dou L, Peng Z (2010) Efficient decision makings for dynamic weapon-target assignment by virtual permutation and tabu search heuristics. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):649–662
Xin B, Chen J, Peng Z, Dou L, Zhang J (2011) An efficient rule-based constructive heuristic to solve dynamic weapon-target assignment problem. IEEE Trans Syst Man Cybern Part a Syst Humans 41(3):598–606
Yagiura M, Ibaraki T (1989) The generalized assignment problem and its generalizations. St. Marys College of Maryland, St. Marys City, MD, USA, Tech. Rep.[Online]. Available: http://faculty. smcm. edu/acjamieson/f12/GAP. pdf
Yeoh W, Varakantham P, Sun X, Koenig S (2015) Incremental DCOP search algorithms for solving dynamic DCOPs. In: Proceedings of IAT. pp. 257–264.
Zhang Y, Yang RN, Zuo JL, Jing X (2015) Improved MOEA/D for dynamic weapon-target assignment problem. J Harb Instit Technol 22(6):121–128
Zivan R, Glinton R, Sycara K (2009) Distributed constraint optimization for large teams of mobile sensing agents. In: Proceeding of the international conference on intelligent agent technology (IAT) pp. 347–354
Funding
The authors declare that no funds, grants or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose. The authors declare that no specific data set has been used.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shokoohi, M., Afsharchi, M. & Shah-Hoseini, H. Dynamic distributed constraint optimization using multi-agent reinforcement learning. Soft Comput 26, 3601–3629 (2022). https://doi.org/10.1007/s00500-022-06820-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-06820-7