Abstract
Long-term inspection and maintenance (I&M) planning, a multi-stage stochastic optimization problem, can be efficiently formulated as a partially observable Markov decision process (POMDP). However, within this context, single-agent approaches do not scale well for large multi-component systems since the joint state, action and observation spaces grow exponentially with the number of components. To alleviate this curse of dimensionality, cooperative decentralized approaches, known as decentralized POMDPs, are often adopted and solved using multi-agent deep reinforcement learning (MADRL) algorithms. This paper examines the centralization vs. decentralization performance of MADRL formulations in I&M planning of multi-component systems. Towards this, we set up a comprehensive computational experimental program focused on k-out-of-n system configurations, a common and broadly applicable archetype of deteriorating engineering systems, to highlight the manifestations of MADRL strengths and pathologies when optimizing global returns under varying decentralization relaxations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A.C., Bellemare, M.: Deep reinforcement learning at the edge of the statistical precipice. In: Advances in Neural Information Processing Systems, vol. 34, pp. 29304–29320. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper/2021/hash/f514cec81cb148559cf475e7426eed5e-Abstract.html
Albrecht, S.V., Christianos, F., Schäfer, L.: Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, Cambridge (2023)
Amato, C., Chowdhary, G., Geramifard, A., Ure, N.K., Kochenderfer, M.J.: Decentralized control of partially observable Markov decision processes. In: 52nd IEEE Conference on Decision and Control. pp. 2398–2405. IEEE, Firenze (2013). http://ieeexplore.ieee.org/document/6760239/
Andriotis, C.P., Papakonstantinou, K.G.: Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliab. Eng. Syst. Saf. 191(March), 106483 (2019). https://doi.org/10.1016/j.ress.2019.04.036
Andriotis, C.P., Papakonstantinou, K.G.: Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints. Reliab. Eng. Syst. Saf. 212(March), 107551 (2021). https://doi.org/10.1016/j.ress.2021.107551
Andriotis, C.P., Papakonstantinou, K.G., Chatzi, E.N.: Value of structural health information in partially observable stochastic environments. Struct. Saf. 93, 102072 (2021). https://doi.org/10.1016/J.STRUSAFE.2020.102072
Arcieri, G., Hoelzl, C., Schwery, O., Straub, D., Papakonstantinou, K.G., Chatzi, E.: Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: an application to railway systems. Reliab. Eng. Syst. Saf. 239, 109496 (2023). https://www.sciencedirect.com/science/article/pii/S0951832023004106
Bismut, E., Straub, D.: Optimal adaptive inspection and maintenance planning for deteriorating structural systems. Reliab. Eng. Syst. Saf. 215, 107891 (2021). https://www.sciencedirect.com/science/article/pii/S0951832021004063
Bono, G., Dibangoye, J.S., Matignon, L., Pereyron, F., Simonin, O.: Cooperative multi-agent policy gradient. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 459–476. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_28
Christianos, F., Papoudakis, G., Rahman, A., Albrecht, S.V.: Scaling multi-agent reinforcement learning with selective parameter sharing (2021). http://arxiv.org/abs/2102.07475, arXiv:2102.07475 [cs]
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/innovative Applications of Artificial Intelligence, pp. 746–752. AAAI 1998/IAAI 1998, American Association for Artificial Intelligence, USA (1998)
Deodatis, G., Fujimoto, Y., Ito, S., Spencer, J., Itagaki, H.: Non-periodic inspection by Bayesian method I. Probab. Eng. Mech. 7(4), 191–204 (1992). https://www.sciencedirect.com/science/article/pii/026689209290023B
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients (2017). http://arxiv.org/abs/1705.08926, arXiv:1705.08926 [cs]
Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative Q-learning systems. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 780–785. IJCAI 2007, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2007)
Goldman, C.V., Zilberstein, S.: Decentralized control of cooperative systems: categorization and complexity analysis. J. Artif. Intell. Res. 22, 143–174 (2004). https://doi.org/10.1613/jair.1427
Grall, A., Bérenguer, C., Dieulle, L.: A condition-based maintenance policy for stochastically deteriorating systems. Reliab. Eng. Syst. Saf. 76(2), 167–180 (2002). https://doi.org/10.1016/S0951-8320(01)00148-X
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022). https://doi.org/10.1007/s10462-021-09996-w
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X
Kapur, K.C., Pecht, M.: Reliability Engineering, 1st edn. Wiley, Hoboken (2014)
Kochenderfer, M.J., Wheeler, T.A., Wray, K.H.: Algorithms for Decision Making. MIT Press, Cambridge (2022)
Kuo, W., Zuo, M.: Optimal Reliability Modeling: Principles and Applications. Wiley, Hoboken (2003). https://catalogimages.wiley.com/images/db/pdf/047139761X.07.pdf
Leroy, P., Morato, P.G., Pisane, J., Kolios, A., Ernst, D.: IMP-MARL: a suite of environments for large-scale infrastructure management planning via MARL (2023), http://arxiv.org/abs/2306.11551, arXiv:2306.11551 [cs, eess] version: 1
Luque, J., Straub, D.: Risk-based optimal inspection strategies for structural systems using dynamic Bayesian networks. Struct. Saf. 76, 68–80(June 2017) (2019). https://doi.org/10.1016/j.strusafe.2018.08.002
Lyu, X., Baisero, A., Xiao, Y., Daley, B., Amato, C.: On centralized critics in multi-agent reinforcement learning. J. Artif. Intell. Res. 77, 295–354 (2023). https://www.jair.org/index.php/jair/article/view/14386
Madanat, S., Ben-Akiva, M.: Optimal inspection and repair policies for infrastructure facilities. Transp. Sci. 28(1), 55–62 (1994). https://doi.org/10.1287/trsc.28.1.55,https://pubsonline.informs.org/doi/10.1287/trsc.28.1.55
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012). https://www.cambridge.org/core/product/identifier/S0269888912000057/type/journal_article
Morato, P.G., Andriotis, C.P., Papakonstantinou, K.G., Rigo, P.: Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. Reliab. Eng. Syst. Saf. 235, 109144 (2023). https://www.sciencedirect.com/science/article/pii/S0951832023000595
Morato, P.G., Papakonstantinou, K.G., Andriotis, C.P., Nielsen, J.S., Rigo, P.: Optimal inspection and maintenance planning for deteriorating structural components through dynamic Bayesian networks and Markov decision processes. Struct. Saf. 94(August 2021), 102140 (2022). https://doi.org/10.1016/j.strusafe.2021.102140
Oliehoek, F.A., Amato, C.: A Concise Introduction to Decentralized POMDPs. SpringerBriefs in Intelligent Systems, Springer, Cham (2016)
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008). https://doi.org/10.1613/jair.2447
Papakonstantinou, K.G., Shinozuka, M.: Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part II: POMDP implementation. Reliab. Engi. Syst. Saf. 130, 214–224 (2014). https://doi.org/10.1016/j.ress.2014.04.006
Papakonstantinou, K.G., Andriotis, C.P., Shinozuka, M.: POMDP and MOMDP solutions for structural life-cycle cost minimization under partial and mixed observability. Struct. Infrastruct. Eng. 14(7), 869–882 (2018). https://doi.org/10.1080/15732479.2018.1439973
Papoudakis, G., Christianos, F., Schäfer, L., Albrecht, S.V.: Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks (2021). http://arxiv.org/abs/2006.07869, arXiv:2006.07869 [cs, stat]
Peng, B., et al.: FACMAC: factored multi-agent centralised policy gradients (2021). http://arxiv.org/abs/2003.06709, arXiv:2003.06709 [cs, stat]
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning (2018). http://arxiv.org/abs/1803.11485, arXiv:1803.11485 [cs, stat]
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013). https://doi.org/10.1007/s10458-012-9200-2
Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning (2017). http://arxiv.org/abs/1706.05296
Terry, J.K., Grammel, N., Son, S., Black, B.: Parameter sharing for heterogeneous agents in multi-agent reinforcement learning (2022). http://arxiv.org/abs/2005.13625
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2017). http://arxiv.org/abs/1611.01224
Wong, A., Bäck, T., Kononova, A.V., Plaat, A.: Deep multiagent reinforcement learning: challenges and directions (2022). http://arxiv.org/abs/2106.15691, arXiv:2106.15691 [cs]
Yu, C., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games (2022). http://arxiv.org/abs/2103.01955, arXiv:2103.01955 [cs]
Zhu, B., Frangopol, D.M.: Risk-based approach for optimum maintenance of bridges under traffic and earthquake loads. J. Struct. Eng. 139(3), 422–434 (2013). https://doi.org/10.1061/(ASCE)ST.1943-541X.0000671
Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965). https://doi.org/10.1016/0022-247X(65)90154-X
Acknowledgements
This material is based upon work supported by the TU Delft AI Labs program. The authors gratefully acknowledge this support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
As described in the main text, we use the 4-out-of-5 setting for hyperparameter optimization and report the tuned hyperparameters in Table 4. We note that decentralized agents often exhibit instabilities when using large replay buffers, thus, their replay buffers are much smaller than their centralized counterparts. This is because random samples from a large replay buffer can correspond to policies significantly different from the current policy, forcing agents to modify their recently learned policy drastically.

For training, we employ a variant of the actor-critic with experience replay (ACER) algorithm for learning [41] as introduced for I&M in [4, 5] and outlined above. To minimize the variance caused by the importance sampling weights, we clip the values \(w_i\) by setting \(\bar{w}=2\).
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhustali, P., Andriotis, C.P. (2025). Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems. In: Oliehoek, F.A., Kok, M., Verwer, S. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2023. Communications in Computer and Information Science, vol 2187. Springer, Cham. https://doi.org/10.1007/978-3-031-74650-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-74650-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74649-9
Online ISBN: 978-3-031-74650-5
eBook Packages: Artificial Intelligence (R0)