Dynamic distributed constraint optimization using multi-agent reinforcement learning

Shokoohi, Maryam; Afsharchi, Mohsen; Shah-Hoseini, Hamed

doi:10.1007/s00500-022-06820-7

Dynamic distributed constraint optimization using multi-agent reinforcement learning

Foundations
Published: 16 March 2022

Volume 26, pages 3601–3629, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

1158 Accesses
9 Citations
Explore all metrics

Abstract

An inherent difficulty in dynamic distributed constraint optimization problems (dynamic DCOP) is the uncertainty of future events when making an assignment at the current time. This dependency is not well addressed in the research community. This paper proposes a reinforcement-learning-based solver for dynamic distributed constraint optimization. We show that reinforcement learning techniques are an alternative approach to solve the given problem over time and are computationally more efficient than sequential DCOP solvers. We also use the novel heuristic to obtain the correct results and describe a formalism that has been adopted to model dynamic DCOPs with cooperative agents. We evaluate this approach in dynamic weapon target assignment (dynamic WTA) problem, via experimental results. We observe that the system dynamic WTA problem remains a safe zone after convergence while satisfying the constraints. Moreover, in the experiment we have implemented the agents that finally converge to the correct assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiagent Reinforcement Learning for Combinatorial Optimization

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Article 01 December 2023

Collaboration and Negotiation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

All data generated or analyzed during this study are included in this paper.

References

Ahner DK, Parson CR (2015) Optimal multi-stage allocation of weapons to targets using adaptive dynamic programming. Optim Lett 9(8):1689–1701
Article MathSciNet Google Scholar
Amato C, Chowdhary G, Geramifard A, Üre NK, Kochenderfer MJ (2013) Decentralized control of partially observable Markov decision processes. In52nd IEEE conference on decision and control, IEEE 2013. Pp. 2398–240
Ballantine JP, Jerbert AR (1952) Distance from a line, or plane, to a poin. The Am Math Monthly 59(4):242–243
MathSciNet Google Scholar
Becker R, Zilberstein S, Lesser V, Goldman CV (2004) Solving transition independent decentralized Markov decision processes. J Artif Intell Res 1(22):423–455
Article MathSciNet Google Scholar
Bellman R (2013) Dynamic programming. Courier Corporation
Bernstein D, Given R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision process. Math Oper Res 27(4):819–840
Article MathSciNet Google Scholar
Bertsekas DP, Homer ML, Logan DA, Patek SD, Sandell NR (2000) Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Trans Syst Man Cybern Part a Syst Hum 30(1):42–51
Article Google Scholar
Blodgett DE, Gendreau M, Guertin F, Potvin JY, Séguin R (2003) A tabu search heuristic for resource management in naval warfare. J Heuristics 9(2):145–169
Article Google Scholar
Boutilier C (1999) Sequential optimality and coordination in multiagent systems. In: IJCAI vol. 99: pp. 478–485
Cares J (2006) Distributed networked operations: the foundations of network centric warfare. IUniverse
Chen J, Yang J, Ye G (2015) Auction algorithm approaches for dynamic weapon target assignment problem. In: Computer science and network technology (ICCSNT), 4th international conference. IEEE vol. 1: pp. 402–405
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752):2
Google Scholar
Davis MT, Robbins MJ, Lunday BJ (2016) Approximate dynamic programming for missile defense interceptor fire control. Eur J Oper Res 259(3):873–886
Article MathSciNet Google Scholar
denBroeder Jr GG, Ellison RE, Emerling L (1959) On optimum target assignments. Oper Res 7(3):322–326
Article MathSciNet Google Scholar
Eckler AR, Burr SA (1972) Mathematical models of target coverage and missile allocation. MILITARY OPERATIONS RESEARCH SOCIETY ALEXANDRIA VA
Farinelli A, Rogers A, Petcu A, Jennings N (2008) Decentralised coordination of low-power embedded devices using the MaxSum algorithm. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 639–646
Fioretto F, Pontelli E, Yeoh W (2018) Distributed constraint optimization problems and applications: a survey. J Artif Intell Res 61:623–698
Article MathSciNet Google Scholar
Fioretto F, Yeoh W, Pontelli E, Ma Y, Ranade S (2017) DCOP approach to the economic dispatch with demand response. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) pp. 999–1007
Ghanbari, AA et al. (2021) A survey on weapon target allocation models and applications. In: Computational optimization techniques and applications. IntechOpen
Hammond L (2016) Application of a dynamic programming algorithm for weapon target assignment. Defence Science and Technology Group Fishermans Bend Victoria Australia
Hoang KD, Fioretto F, Hou P, Yokoo M, Yeoh W, Zivan R (2016) Proactive dynamic distributed constraint optimization. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 597–605
Hoang KD, Hou P, Fioretto F, Yeoh W, Zivan R, Yokoo M (2017) Infinite-horizon proactive dynamic DCOPs. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS). pp. 212–220
Hosein PA (1989) A class of dynamic nonlinear resource allocation problems. Massachusetts Inst Of Tech Cambridge Lab For Information And Decision Systems
Hosein PA, Walton JT, Athans M (1988) Dynamic weapon-target assignment problems with vulnerable C2Ì³ nodes
Jinjun L, Rong C, Jiguangt X (2006) Dynamic WTA optimization model of air defense operation of warships’ formation. J Syst Eng Electron 17(1):126–131
Article Google Scholar
Karasakal O, Özdemirel NE, Kandiller L (2011) Anti-ship missile defense for a naval task group. Naval Res Logist (NRL) 58(3):304–321
Article MathSciNet Google Scholar
Kinoshita K, Iizuka K, Iizuka Y (2013) Effective disaster evacuation by solving the distributed constraint optimization problem. In: Proceedings of the international conference on advanced applied informatics (IIAIAAI). pp. 399–400
L´eaut´e T, Faltings B (2011) Coordinating logistics operations with privacy guarantees. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 2482–2487
Leboucher C, Shin HS, Siarry P, Chelouah R, Le Ménec S, Tsourdos A (2013) A two-step optimisation method for dynamic weapon target assignment problem. In: Recent advances on meta-heuristics and their application to real scenarios InTech
Leboucher C, Le Menec S, Kotenkoff A, Shin HS, Tsourdos A (2013) Optimal weapon target assignment based on an geometric approach. In: 19th IFAC symposium on automatic control in aerospace. Vol. 19: pp. 341–346
Leboucher C, Shin HS, Le Ménec S, Tsourdos A, Kotenkoff A, Siarry P, Chelouah R (2014) Novel evolutionary game based multi-objective optimisation for dynamic weapon target assignment. In: IFAC Proceedings p. 47(3)
Littman ML (2009) A tutorial on partially observable Markov decision processes. J Math Psychol 53(3):119–125
Article MathSciNet Google Scholar
Lloyd S, Witsenhausen H (1986) Weapons allocation is NP-complete. IEEE Summer Simulation Conference
Maheswaran R, Pearce J, Tambe M (2004) Distributed algorithms for DCOP: A graphical game-based approach. In: Proceedingsof the international conference on parallel and distributed computing systems (PDCS). pp. 432–439
Matlin S (1970) A review of the literature on the missile-allocation problem. Oper Res 18(2):334–373
Article Google Scholar
Mei Z, Peng Z, Zhang X (2017) Optimal dynamic weapon-target assignment based on receding horizon control heuristic. In: Control & automation (ICCA), 2017 13th IEEE international conference IEEE pp. 876–88
Mirjalili S (2019) Genetic algorithm. In: Mirjalili S (ed) Evolutionary algorithms and neural networks. Springer, Cham
Chapter Google Scholar
Moccia L, Cordeau JF, Monaco MF, Sammarra M (2007) Formulations and solution algorithms for a Dynamic Generalized Assignment Problem. CIRRELT
Modi P, Shen WM, Tambe M, Yokoo M (2005) ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artif Intell 161(1–2):149–180
Article MathSciNet Google Scholar
Morales DR, Romeijn HE (2004) The generalized assignment problem and extensions. In: Du D-Z, Pardalos PM (eds) Handbook of combinatorial optimization. Springer, Boston
Google Scholar
Murphey RA (2014) Target-based weapon target assignment problems. In: Pardalos PM, Pitsoulis LS (eds) Nonlinear assignment problems. Springer, Boston
Google Scholar
Naeem H, Masood A, Hussain M, Khan SA (2009) A novel two-staged decision support based threat evaluation and weapon assignment algorithm, asset-based dynamic weapon scheduling using artificial intelligence techniques. arXiv:0907.0067.
Nguyen DT, Yeoh W, Lau HC, Zilberstein S, Zhang C (2014) Decentralized multi-agent reinforcement learning average reward dynamic dcop. In: Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS). pp. 1341–1342
Otterlo MV, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning. Springer, pp 3–42
Chapter Google Scholar
Petcu A, Faltings B (2005) A scalable method for multiagent constraint optimization. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 1413–1420
Petcu A, Faltings B (2008) Optimal solution stability in dynamic, distributed constraint optimization. In: Proceedings of the international conference on intelligent agent technology (IAT) pp. 321–327. IEEE/WIC/ACM
Petcu A, Faltings B (2005) Superstabilizing, fault-containing distributed combinatorial optimization. In: Proceedings of the AAAI conference on artificial intelligence (AAAI) pp. 449–454.
Proper S, Tadepalli P (2009) Solving multiagent assignment markov decision processes. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1. pp. 681–688.
Ramchurn SD, Farinelli A, Macarthur KS, Jennings NR (2010) Decentralized coordination in robocup rescue. Comput J 53(9):1447–1461
Article Google Scholar
Roux JN, Van Vuuren JH (2007) Threat evaluation and weapon assignment decision support: a review of the state of the art. ORiON 23(2):151–187
Article Google Scholar
Scerri P, Farinelli A, Okamoto S, Tambe M (2005) Allocating tasks in extreme teams. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, ACM. Pp. 727–734
Semnani SH, Basir OA (2013) Target to sensor allocation: a hierarchical dynamic distributed constraint optimization approach. Comput Commun 36(9):1024–1038
Article Google Scholar
Sikanen T (2008) Solving weapon target assignment problem with dynamic programming. Indep Res Proj Appl Mathe 17:32
Google Scholar
Silav, A., Karasakal, E., & Karasakal, O. (2021). Bi-objective dynamic weapon-target assignment problem with stability measure. Annals Oper Res 1–19
Spaan MT (2012) Partially observable Markov decision processes. In: M Wiering, M Otterlo van (eds) Reinforcement Learning, Springer, Berlin
Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robot 8(3):345–383
Article Google Scholar
Sultanik E, Lass RN, Regli WC (2009) Dynamic configuration of agent organizations. In: Proceedings of the international joint conference on artificial intelligence (IJCAI) pp. 305–311
Sutton RS, Barto AG (1999) Reinforcement learning. J Cognit Neurosci 11(1):126–134
Article Google Scholar
Wang Y, Li J, Huang W, Wen T (2017) Dynamic weapon target assignment based on intuitionistic fuzzy entropy of discrete particle swarm. China Commun 14(1):169–179
Article Google Scholar
Wu L, Wang HY, Lu FX, Jia P (208) An anytime algorithm based on modified GA for dynamic weapon target allocation problem. In: Evolutionary computation, CEC 2008. (IEEE World Congress on Computational Intelligence)
Xin B, Chen J, Zhang J, Dou L, Peng Z (2010) Efficient decision makings for dynamic weapon-target assignment by virtual permutation and tabu search heuristics. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):649–662
Article Google Scholar
Xin B, Chen J, Peng Z, Dou L, Zhang J (2011) An efficient rule-based constructive heuristic to solve dynamic weapon-target assignment problem. IEEE Trans Syst Man Cybern Part a Syst Humans 41(3):598–606
Article Google Scholar
Yagiura M, Ibaraki T (1989) The generalized assignment problem and its generalizations. St. Marys College of Maryland, St. Marys City, MD, USA, Tech. Rep.[Online]. Available: http://faculty. smcm. edu/acjamieson/f12/GAP. pdf
Yeoh W, Varakantham P, Sun X, Koenig S (2015) Incremental DCOP search algorithms for solving dynamic DCOPs. In: Proceedings of IAT. pp. 257–264.
Zhang Y, Yang RN, Zuo JL, Jing X (2015) Improved MOEA/D for dynamic weapon-target assignment problem. J Harb Instit Technol 22(6):121–128
Google Scholar
Zivan R, Glinton R, Sycara K (2009) Distributed constraint optimization for large teams of mobile sensing agents. In: Proceeding of the international conference on intelligent agent technology (IAT) pp. 347–354

Download references

Funding

The authors declare that no funds, grants or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Mechanic, Electrical and Computer, Science and Research Branch, Islamic Azad University, Tehran, Iran
Maryam Shokoohi & Hamed Shah-Hoseini
Department of Electrical and Computer, Engineering, University of Zanjan, Zanjan, Iran
Mohsen Afsharchi

Authors

Maryam Shokoohi
View author publications
You can also search for this author inPubMed Google Scholar
Mohsen Afsharchi
View author publications
You can also search for this author inPubMed Google Scholar
Hamed Shah-Hoseini
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohsen Afsharchi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors declare that no specific data set has been used.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shokoohi, M., Afsharchi, M. & Shah-Hoseini, H. Dynamic distributed constraint optimization using multi-agent reinforcement learning. Soft Comput 26, 3601–3629 (2022). https://doi.org/10.1007/s00500-022-06820-7

Download citation

Accepted: 21 January 2022
Published: 16 March 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00500-022-06820-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic distributed constraint optimization using multi-agent reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiagent Reinforcement Learning for Combinatorial Optimization

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Collaboration and Negotiation

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now