Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems

Bhustali, Prateek; Andriotis, Charalampos P.

doi:10.1007/978-3-031-74650-5_13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2187))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

22 Accesses

Abstract

Long-term inspection and maintenance (I&M) planning, a multi-stage stochastic optimization problem, can be efficiently formulated as a partially observable Markov decision process (POMDP). However, within this context, single-agent approaches do not scale well for large multi-component systems since the joint state, action and observation spaces grow exponentially with the number of components. To alleviate this curse of dimensionality, cooperative decentralized approaches, known as decentralized POMDPs, are often adopted and solved using multi-agent deep reinforcement learning (MADRL) algorithms. This paper examines the centralization vs. decentralization performance of MADRL formulations in I&M planning of multi-component systems. Towards this, we set up a comprehensive computational experimental program focused on k-out-of-n system configurations, a common and broadly applicable archetype of deteriorating engineering systems, to highlight the manifestations of MADRL strengths and pathologies when optimizing global returns under varying decentralization relaxations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A.C., Bellemare, M.: Deep reinforcement learning at the edge of the statistical precipice. In: Advances in Neural Information Processing Systems, vol. 34, pp. 29304–29320. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper/2021/hash/f514cec81cb148559cf475e7426eed5e-Abstract.html
Albrecht, S.V., Christianos, F., Schäfer, L.: Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, Cambridge (2023)
Google Scholar
Amato, C., Chowdhary, G., Geramifard, A., Ure, N.K., Kochenderfer, M.J.: Decentralized control of partially observable Markov decision processes. In: 52nd IEEE Conference on Decision and Control. pp. 2398–2405. IEEE, Firenze (2013). http://ieeexplore.ieee.org/document/6760239/
Andriotis, C.P., Papakonstantinou, K.G.: Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliab. Eng. Syst. Saf. 191(March), 106483 (2019). https://doi.org/10.1016/j.ress.2019.04.036
Article Google Scholar
Andriotis, C.P., Papakonstantinou, K.G.: Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints. Reliab. Eng. Syst. Saf. 212(March), 107551 (2021). https://doi.org/10.1016/j.ress.2021.107551
Andriotis, C.P., Papakonstantinou, K.G., Chatzi, E.N.: Value of structural health information in partially observable stochastic environments. Struct. Saf. 93, 102072 (2021). https://doi.org/10.1016/J.STRUSAFE.2020.102072
Article Google Scholar
Arcieri, G., Hoelzl, C., Schwery, O., Straub, D., Papakonstantinou, K.G., Chatzi, E.: Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: an application to railway systems. Reliab. Eng. Syst. Saf. 239, 109496 (2023). https://www.sciencedirect.com/science/article/pii/S0951832023004106
Bismut, E., Straub, D.: Optimal adaptive inspection and maintenance planning for deteriorating structural systems. Reliab. Eng. Syst. Saf. 215, 107891 (2021). https://www.sciencedirect.com/science/article/pii/S0951832021004063
Bono, G., Dibangoye, J.S., Matignon, L., Pereyron, F., Simonin, O.: Cooperative multi-agent policy gradient. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 459–476. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_28
Chapter Google Scholar
Christianos, F., Papoudakis, G., Rahman, A., Albrecht, S.V.: Scaling multi-agent reinforcement learning with selective parameter sharing (2021). http://arxiv.org/abs/2102.07475, arXiv:2102.07475 [cs]
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/innovative Applications of Artificial Intelligence, pp. 746–752. AAAI 1998/IAAI 1998, American Association for Artificial Intelligence, USA (1998)
Google Scholar
Deodatis, G., Fujimoto, Y., Ito, S., Spencer, J., Itagaki, H.: Non-periodic inspection by Bayesian method I. Probab. Eng. Mech. 7(4), 191–204 (1992). https://www.sciencedirect.com/science/article/pii/026689209290023B
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients (2017). http://arxiv.org/abs/1705.08926, arXiv:1705.08926 [cs]
Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative Q-learning systems. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 780–785. IJCAI 2007, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2007)
Google Scholar
Goldman, C.V., Zilberstein, S.: Decentralized control of cooperative systems: categorization and complexity analysis. J. Artif. Intell. Res. 22, 143–174 (2004). https://doi.org/10.1613/jair.1427
Article MathSciNet Google Scholar
Grall, A., Bérenguer, C., Dieulle, L.: A condition-based maintenance policy for stochastically deteriorating systems. Reliab. Eng. Syst. Saf. 76(2), 167–180 (2002). https://doi.org/10.1016/S0951-8320(01)00148-X
Article Google Scholar
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022). https://doi.org/10.1007/s10462-021-09996-w
Article Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Chapter Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X
Article MathSciNet Google Scholar
Kapur, K.C., Pecht, M.: Reliability Engineering, 1st edn. Wiley, Hoboken (2014)
Book Google Scholar
Kochenderfer, M.J., Wheeler, T.A., Wray, K.H.: Algorithms for Decision Making. MIT Press, Cambridge (2022)
Google Scholar
Kuo, W., Zuo, M.: Optimal Reliability Modeling: Principles and Applications. Wiley, Hoboken (2003). https://catalogimages.wiley.com/images/db/pdf/047139761X.07.pdf
Leroy, P., Morato, P.G., Pisane, J., Kolios, A., Ernst, D.: IMP-MARL: a suite of environments for large-scale infrastructure management planning via MARL (2023), http://arxiv.org/abs/2306.11551, arXiv:2306.11551 [cs, eess] version: 1
Luque, J., Straub, D.: Risk-based optimal inspection strategies for structural systems using dynamic Bayesian networks. Struct. Saf. 76, 68–80(June 2017) (2019). https://doi.org/10.1016/j.strusafe.2018.08.002
Lyu, X., Baisero, A., Xiao, Y., Daley, B., Amato, C.: On centralized critics in multi-agent reinforcement learning. J. Artif. Intell. Res. 77, 295–354 (2023). https://www.jair.org/index.php/jair/article/view/14386
Madanat, S., Ben-Akiva, M.: Optimal inspection and repair policies for infrastructure facilities. Transp. Sci. 28(1), 55–62 (1994). https://doi.org/10.1287/trsc.28.1.55,https://pubsonline.informs.org/doi/10.1287/trsc.28.1.55
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012). https://www.cambridge.org/core/product/identifier/S0269888912000057/type/journal_article
Morato, P.G., Andriotis, C.P., Papakonstantinou, K.G., Rigo, P.: Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. Reliab. Eng. Syst. Saf. 235, 109144 (2023). https://www.sciencedirect.com/science/article/pii/S0951832023000595
Morato, P.G., Papakonstantinou, K.G., Andriotis, C.P., Nielsen, J.S., Rigo, P.: Optimal inspection and maintenance planning for deteriorating structural components through dynamic Bayesian networks and Markov decision processes. Struct. Saf. 94(August 2021), 102140 (2022). https://doi.org/10.1016/j.strusafe.2021.102140
Oliehoek, F.A., Amato, C.: A Concise Introduction to Decentralized POMDPs. SpringerBriefs in Intelligent Systems, Springer, Cham (2016)
Book Google Scholar
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008). https://doi.org/10.1613/jair.2447
Article MathSciNet Google Scholar
Papakonstantinou, K.G., Shinozuka, M.: Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part II: POMDP implementation. Reliab. Engi. Syst. Saf. 130, 214–224 (2014). https://doi.org/10.1016/j.ress.2014.04.006
Article Google Scholar
Papakonstantinou, K.G., Andriotis, C.P., Shinozuka, M.: POMDP and MOMDP solutions for structural life-cycle cost minimization under partial and mixed observability. Struct. Infrastruct. Eng. 14(7), 869–882 (2018). https://doi.org/10.1080/15732479.2018.1439973
Article Google Scholar
Papoudakis, G., Christianos, F., Schäfer, L., Albrecht, S.V.: Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks (2021). http://arxiv.org/abs/2006.07869, arXiv:2006.07869 [cs, stat]
Peng, B., et al.: FACMAC: factored multi-agent centralised policy gradients (2021). http://arxiv.org/abs/2003.06709, arXiv:2003.06709 [cs, stat]
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning (2018). http://arxiv.org/abs/1803.11485, arXiv:1803.11485 [cs, stat]
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013). https://doi.org/10.1007/s10458-012-9200-2
Article Google Scholar
Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)
Article Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning (2017). http://arxiv.org/abs/1706.05296
Terry, J.K., Grammel, N., Son, S., Black, B.: Parameter sharing for heterogeneous agents in multi-agent reinforcement learning (2022). http://arxiv.org/abs/2005.13625
Wang, Z., et al.: Sample efficient actor-critic with experience replay (2017). http://arxiv.org/abs/1611.01224
Wong, A., Bäck, T., Kononova, A.V., Plaat, A.: Deep multiagent reinforcement learning: challenges and directions (2022). http://arxiv.org/abs/2106.15691, arXiv:2106.15691 [cs]
Yu, C., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games (2022). http://arxiv.org/abs/2103.01955, arXiv:2103.01955 [cs]
Zhu, B., Frangopol, D.M.: Risk-based approach for optimum maintenance of bridges under traffic and earthquake loads. J. Struct. Eng. 139(3), 422–434 (2013). https://doi.org/10.1061/(ASCE)ST.1943-541X.0000671
Article Google Scholar
Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965). https://doi.org/10.1016/0022-247X(65)90154-X
Article MathSciNet Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the TU Delft AI Labs program. The authors gratefully acknowledge this support.

Author information

Authors and Affiliations

Faculty of Architecture and the Built Environment, TU Delft, Delft, The Netherlands
Prateek Bhustali & Charalampos P. Andriotis

Authors

Prateek Bhustali
View author publications
You can also search for this author in PubMed Google Scholar
Charalampos P. Andriotis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prateek Bhustali .

Editor information

Editors and Affiliations

Delft University of Technology, Delft, The Netherlands
Frans A. Oliehoek
Delft University of Technology, Delft, The Netherlands
Manon Kok
Delft University of Technology, Delft, The Netherlands
Sicco Verwer

Appendix

Table 3. Mean performance of the algorithms aggregated over ten training instances (random seeds) with ± indicating the 95% confidence interval. Bold indicates best in the respective k-out-of-n setting.

Full size table

Table 4. Tuned hyperparameters of the algorithms used to train agents on various k-out-of-n settings

Full size table

As described in the main text, we use the 4-out-of-5 setting for hyperparameter optimization and report the tuned hyperparameters in Table 4. We note that decentralized agents often exhibit instabilities when using large replay buffers, thus, their replay buffers are much smaller than their centralized counterparts. This is because random samples from a large replay buffer can correspond to policies significantly different from the current policy, forcing agents to modify their recently learned policy drastically.

For training, we employ a variant of the actor-critic with experience replay (ACER) algorithm for learning [41] as introduced for I&M in [4, 5] and outlined above. To minimize the variance caused by the importance sampling weights, we clip the values $w_i$ by setting $\bar{w}=2$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhustali, P., Andriotis, C.P. (2025). Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems. In: Oliehoek, F.A., Kok, M., Verwer, S. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2023. Communications in Computer and Information Science, vol 2187. Springer, Cham. https://doi.org/10.1007/978-3-031-74650-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-74650-5_13
Published: 02 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74649-9
Online ISBN: 978-3-031-74650-5
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

Assessing the Optimality of Decentralized Inspection and Maintenance Policies for Stochastically Degrading Engineering Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us