Abstract
Modern system design and development often consists of combining different components developed by separate vendors under some known constraints that allow them to operate together. Such a system may further benefit from further refinement when the components are integrated together. We suggest a learning-open architecture that employs deep reinforcement learning performed under weak assumptions. The components are “black boxes”, where their internal structure is not known, and the learning is performed in a distributed way, where each process is aware only on its local execution information and the global utility value of the system, calculated after complete executions. We employ the proximal policy optimization (PPO) as our learning architecture adapted to our case of training control for black box components. We start by applying the PPO architecture to a simplified case, where we need to train a single component that is connected to a black box environment; we show a stark improvement when compared to a previous attempt. Then we move to study the case of multiple components.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 956123 - FOCETA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This can also be easily generalized to employ a utility function that is calculated from the local utility functions of each component; for example, each component can calculate some measure of success (e.g., in form of a discounting sum) on performing interactions, and the global utility can be the sum of this measure over the participating processes.
References
Brown, T.B., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual (2020)
Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C 38(2), 156–172 (2008)
Cassandras, C.G., Lafortune, S.: Introduction to Discrete Event Systems, 2nd edn. Springer, Cham (2008). https://doi.org/10.1007/978-0-387-68612-7
Gößler, G., Sifakis, J.: Composition for component-based modeling. Sci. Comput. Program. 55(1–3), 161–183 (2005)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Hoare, C.A.R.: Communicating sequential processes. Commun. ACM 21(8), 666–677 (1978)
Iosti, S., Peled, D., Aharon, K., Bensalem, S., Goldberg, Y.: Synthesizing control for a system with black box environment, based on deep learning. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12477, pp. 457–472. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61470-6_27
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kravaris, T., Vouros, G.A.: Deep multiagent reinforcement learning methods addressing the scalability challenge. Appl. Intell. (2023)
Lehmann, D., Rabin, M.O.: On the advantages of free choice: a symmetric and fully distributed solution to the dining philosophers problem. In: Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 133–138 (1981)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, London (2018)
Tan, M.: Multi-agent reinforcement learning: Independent versus cooperative agents. In: Utgoff, P.E. (ed.) Machine Learning, Proceedings of the Tenth International Conference, University of Massachusetts, Amherst, MA, USA, June 27–29, 1993, pp. 330–337. Morgan Kaufmann (1993)
Zhang, K., Yang, Z., Liu, H., Zhang, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: International Conference on Machine Learning, pp. 5872–5881. PMLR (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cohen, I., Peled, D. (2024). Integrating Distributed Component-Based Systems Through Deep Reinforcement Learning. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-46002-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46001-2
Online ISBN: 978-3-031-46002-9
eBook Packages: Computer ScienceComputer Science (R0)