Skip to main content

Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning

  • Chapter
  • First Online:
Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13160))

Abstract

Autonomous systems need to be able dynamically adapt to changing requirements and environmental conditions without redeployment and without interruption of the systems functionality. The EU project ASCENS has developed a comprehensive suite of foundational theories and methods for building autonomic systems. In this paper we specialise the EDLC process model of ASCENS to deal with planning and reinforcement learning techniques. We present the “AIDL” life cycle and illustrate it with two case studies: simulation-based online planning and the PSyCo reinforcement learning approach for synthesizing agent policies from hard and soft requirements. Related work and potential avenues for future research are discussed.

Dedicated to Manuel Hermenegildo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    also called internal model or simulation model in the literature.

  2. 2.

    https://de.mathworks.com/products/reinforcement-learning.html.

  3. 3.

    https://gym.openai.com/.

  4. 4.

    https://pytorch.org/.

  5. 5.

    https://www.tensorflow.org/.

References

  1. ASCENS: Autonomic Component Ensembles. Integrated Project, 01 Oct 2010–31 Mar 2015, Grant agreement no: 257414, EU 7th Framework Programme. http://www.ascens-ist.eu/. Accessed 21 April 2020

  2. Gartner Inc.: Market Guide for AIOps Platforms (2019). https://www.bmc.com/forms/tools-and-strategies-for-effective-aiops.html. Accessed 07 Oct 2020

  3. Google Cloud Solutions: MLOps: Continuous delivery and automation pipelines in machine learning. https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning. Accessed 07 Oct 2020

  4. OpenAI. Spinning Up in Deep RL! Part 2: Kinds of RL Algorithms (2018). https://spinningup.openai.com. Accessed 07 July 2020

  5. Abeywickrama, D., Bicocchi, N., Mamei, M., Zambonelli, F.: The SOTA approach to engineering collective adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 399–415 (2020)

    Article  Google Scholar 

  6. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018)

    Google Scholar 

  7. Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)

    Google Scholar 

  8. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR, abs/1606.06565 (2016)

    Google Scholar 

  9. Beavis, B., Dobbs, I.: Optimisation and Stability Theory for Economic Analysis. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  10. Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1

    Chapter  Google Scholar 

  11. Belzner, L., Hölzl, M.M., Koch, N., Wirsing, M.: Collective autonomic systems: towards engineering principles and their foundations. Trans. Found. Mastering Chang. 1, 180–200 (2016)

    Article  Google Scholar 

  12. Belzner, L., Wirsing, M.: Synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking. Sci. Comput. Program. 206, 102620 (2021)

    Article  Google Scholar 

  13. Bernardo, M., De Nicola, R., Hillston, J.: Formal Methods for the Quantitative Evaluation of Collective Adaptive Systems, SFM 2016, vol. 9700, Lecture Notes in Computer Science. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8

  14. Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J.: Tropos: an agent-oriented software development methodology. JAAMAS 8(3), 203–236 (2004)

    MATH  Google Scholar 

  15. Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  16. Brun, Y., et al.: Engineering self-adaptive systems through feedback loops. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 48–70. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_3

    Chapter  Google Scholar 

  17. Bureš, T., et al.: A life cycle for the development of autonomic systems: the e-mobility showcase. In: SASO Workshops, pp. 71–76 (2013)

    Google Scholar 

  18. Clavera, I., Rothfuss, J., Schulman, J., Fujita, Y., Asfour, T., Abbeel, P.: Model-based reinforcement learning via meta-policy optimization. In: CoRL 2018, Proceedings of Machine Learning Research, vol, 87, pp. 617–629. PMLR (2018)

    Google Scholar 

  19. Nicola, R. D., Loreti, M., Pugliese, R., Tiezzi, F.: A formal approach to autonomic systems programming: the SCEL language. ACM Trans. Auton. Adapt. 9(2), 7:1–7:29 (2014)

    Google Scholar 

  20. Drugan, M.M.: Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm Evol. Comput. 44, 228–246 (2019)

    Article  Google Scholar 

  21. Fernandez-Marquez, J.L., Serugendo, G.D.M., Montagna, S., Viroli, M., Arcos, J.L.: Description and composition of bio-inspired design patterns: a complete overview. Nat. Comput. 12(1), 43–67 (2013)

    Article  MathSciNet  Google Scholar 

  22. Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22, 457–476 (2020)

    Article  Google Scholar 

  23. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Asp. Comput. 6(5), 512–535 (1994)

    Article  MATH  Google Scholar 

  24. Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: AAMAS, pp. 483–491. International Foundation for Autonomous Agents and Multiagent Systems (2020)

    Google Scholar 

  25. Hoch, N., Bensler, H.-P., Abeywickrama, D., Bureš, T., Montanari, U.: The E-mobility case study. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 513–533. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_17

    Chapter  Google Scholar 

  26. Horn, P.: Autonomic computing: IBM perspective on the state of information technology. IBM T.J. Watson Labs, NY (2001)

    Google Scholar 

  27. Hölzl, M., Koch, N., Puviani, M., Wirsing, M., Zambonelli, F.: The ensemble development life cycle and best practices for collective autonomic systems. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 325–354. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_9

    Chapter  Google Scholar 

  28. Hölzl, M., Rauschmayer, A., Wirsing, M.: Engineering of software-intensive systems: state of the art and research challenges. In: Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A. (eds.) Software-Intensive Systems and New Computing Paradigms. LNCS, vol. 5380, pp. 1–44. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7_1

    Chapter  MATH  Google Scholar 

  29. Hölzl, M., Wirsing, M.: Towards a system model for ensembles. In: Agha, G., Danvy, O., Meseguer, J. (eds.) Formal Modeling: Actors, Open Systems, Biological Systems. LNCS, vol. 7000, pp. 241–261. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24933-4_12

    Chapter  Google Scholar 

  30. IBM: An architectural blueprint for autonomic computing. Technical report, IBM Corporation (2005)

    Google Scholar 

  31. Inverardi, P., Mori, M.: A software lifecycle process to support consistent evolutions. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 239–264. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_10

    Chapter  Google Scholar 

  32. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  33. Kernbach, S., Schmickl, T., Timmis, J.: Collective adaptive systems: challenges beyond evolvability. CoRR abs/1108.5643 (2011)

    Google Scholar 

  34. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  35. Krutisch, R., Meier, P., Wirsing, M.: The AgentComponent approach, combining agents, and components. In: Schillo, M., Klusch, M., Müller, J., Tianfield, H. (eds.) MATES 2003. LNCS (LNAI), vol. 2831, pp. 1–12. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39869-1_1

    Chapter  Google Scholar 

  36. Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H., et al. (eds.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11

    Chapter  Google Scholar 

  37. Loreti, M., Hillston, J.: Modelling and analysis of collective adaptive systems with CARMA and its tools. In: Bernardo, M., De Nicola, R., Hillston, J. (eds.) SFM 2016. LNCS, vol. 9700, pp. 83–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34096-8_4

    Chapter  Google Scholar 

  38. Mayer, P., et al.: The autonomic cloud. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 495–512. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_16

    Chapter  Google Scholar 

  39. Moerland, T.M., Broekens, J., Jonker, C.M.: A framework for reinforcement learning and planning. CoRR, abs/2006.15009 (2020)

    Google Scholar 

  40. Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey. CoRR, abs/2006.16712 (2020)

    Google Scholar 

  41. Moerland, T.M., Deichler, A., Baldi, S., Broekens, J., Jonker, C.M.: Think too fast nor too slow: The computational trade-off between planning and reinforcement learning. CoRR, abs/2005.07404 (2020)

    Google Scholar 

  42. Nagabandi, A., et al.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: ICLR 2019. OpenReview.net (2019)

    Google Scholar 

  43. Ong, S.C.W., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)

    Article  Google Scholar 

  44. Pinciroli, C., Bonani, M., Mondada, F., Dorigo, M.: Adaptation and awareness in robot ensembles: scenarios and algorithms. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 471–494. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_15

    Chapter  Google Scholar 

  45. Puviani, M., Cabri, G., Zambonelli, F.: Patterns for self-adaptive systems: agent-based simulations. EAI Endorsed Trans. Self-Adapt. Syst. 1(1), e4 (2015)

    Article  Google Scholar 

  46. Rao, A.S., Georgeff, M.P.: Modeling rational agents within a BDI-architecture. In: Proceedings of the Knowledge Representation and Reasoning, pp. 473–484 (1991)

    Google Scholar 

  47. Ray, A., Achiam, J., Amodei, D.: Benchmarking safe exploration in deep reinforcement learning. Technical report, Open AI (2019)

    Google Scholar 

  48. Roche, J.: Adopting DevOps practices in quality assurance. Commun. ACM 56(11), 38–43 (2013)

    Article  Google Scholar 

  49. Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32, 663–704 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  50. Sebastio, S., Vandin, A.: MultiVeStA: statistical model checking for discrete event simulators. In: ValueTools 2013, pp. 310–315. ICST/ACM (2013)

    Google Scholar 

  51. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  52. Silver, D.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  53. Sutton, R.S., Barto, A.G.: Reinforcement Learning - an Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)

    Google Scholar 

  54. Szepesvári, C.: Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, pp. 1–103. Morgan & Claypool Publishers, California (2010)

    Google Scholar 

  55. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  56. Thrun, S., Pratt, L.Y.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L.Y. (eds.) Learning to Learn, pp. 3–17. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_1

    Chapter  MATH  Google Scholar 

  57. Tschaikowski, M., Tribastone, M.: A unified framework for differential aggregations in Markovian process algebra. J. Log. Alg. Meth. Prog. 84(2), 238–258 (2015)

    MathSciNet  MATH  Google Scholar 

  58. Vassev, E., Hinchey, M.: Engineering requirements for autonomy features. In: Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.) Software Engineering for Collective Autonomic Systems. LNCS, vol. 8998, pp. 379–403. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9_11

    Chapter  Google Scholar 

  59. Vilalta, R., Giraud-Carrier, C., Brazdil, P., Soares, C.: Inductive transfer. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning and Data Mining, pp. 666–671. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1_138

  60. Šerbedžija, N., Fairclough, S.: Biocybernetic loop: from awareness to evolution. In: IEEE Evolutionary Computation 2009, pp. 2063–2069. IEEE (2009)

    Google Scholar 

  61. Wang, T., et al.: Benchmarking model-based reinforcement learning. CoRR, abs/1907.02057 (2019)

    Google Scholar 

  62. Weinstein, A., Littman, M.: Open-loop planning in large-scale stochastic domains. In: AAI 2013. AAAI Press (2013)

    Google Scholar 

  63. Weyns, D., et al.: On patterns for decentralized control in self-adaptive systems. In: de Lemos, R., Giese, H., Müller, H.A., Shaw, M. (eds.) Software Engineering for Self-Adaptive Systems II. LNCS, vol. 7475, pp. 76–107. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35813-5_4

    Chapter  Google Scholar 

  64. Wirsing, M., Banâtre, J.-P., Hölzl, M., Rauschmayer, A.: Software-Intensive Systems and New Computing Paradigms - Challenges and Visions, vol. 5380. Lecture Notes in Computer Science. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89437-7

  65. M. Wirsing, M. M. Hölzl, N. Koch, and P. Mayer, editors. Software Engineering for Collective Autonomic Systems - The ASCENS Approach, volume 8998 of Lecture Notes in Computer Science. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16310-9

  66. Wirsing, M., Hölzl, M., Tribastone, M., Zambonelli, F.: ASCENS: engineering autonomic service-component ensembles. In: Beckert, B., Damiani, F., de Boer, F.S., Bonsangue, M.M. (eds.) FMCO 2011. LNCS, vol. 7542, pp. 1–24. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35887-6_1

    Chapter  Google Scholar 

  67. Wooldridge, M.J., Jennings, N.R.: Intelligent agents: theory and practice. Knowl. Eng. Rev. 10(2), 115–152 (1995)

    Article  Google Scholar 

  68. Zambonelli, F., Jennings, N.R., Wooldridge, M.J.: Developing multiagent systems: the Gaia method. ACM Trans. Softw. Eng. Meth. 12(3), 317–370 (2003)

    Article  Google Scholar 

  69. Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to Simulink verification. Formal Meth. Syst. Des. 43(2), 338–367 (2013)

    Article  MATH  Google Scholar 

Download references

Acknowledgement

We thank the anonymous reviewer for constructive criticisms and helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Wirsing .

Editor information

Editors and Affiliations

A Markov Decision Processes

A Markov Decision Processes

A Markov Decision Process (MDP) M defines a domain as a set S of states consisting of all states of the environment and the agent, a set of A of agent actions, and a probability distribution \(T : p(S \vert S, A)\) describing the transition probabilities of reaching some successor state when executing an action in a given state. For expressing optimisation goals the labelled transition system is extended by a reward function \(R : S \times A \times S \rightarrow \mathbb {R}\) which gives the expected immediate reward gained by the agent for taking each action in each state. Moreover, an initial state distribution \(\rho : p(S)\) is given.

An episode \(\textbf{e} \in E\) is a finite or infinite sequence of transitions \((s_i, a_i, s_{i + 1}, r_i)\), \(s_i, s_{i + 1} \in S\), \(a_i \in A, r_i = R(s_i, a, s_{i + 1})\) in the MDP. For a given discount parameter \(\gamma \in [0,1]\) and any finite or infinite episode \(\textbf{e}\), the cumulative return \(\mathcal {R}\) is the discounted sum of rewards \(\mathcal {R} = \sum _{i = 1}^{|\textbf{e}|} \gamma ^{i} r_i\). Depending on the application, the agent behaves in an environment according to a memoryless stationary policy \(\pi : S \rightarrow p(A)\) or according to a deterministic memoryless policy \(\pi : S \rightarrow A\) with the goal to maximise the expectation of the cumulative return \(\mathbb {E}(\mathcal {R})\).

A partially observable Markov Decision Process (POMDP) [32] is a Markov decision process together with a set \(\varOmega \) of observations and an observation probability distribution \(O : p(\varOmega \vert S, A)\).

A Constrained Markov Decision Process (CMDP) has an additional cost function \(C : S \times A \times S \rightarrow \mathbb {R}\) which can be used for expressing constraints and safety goals.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wirsing, M., Belzner, L. (2023). Towards Systematically Engineering Autonomous Systems Using Reinforcement Learning and Planning. In: Lopez-Garcia, P., Gallagher, J.P., Giacobazzi, R. (eds) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. Lecture Notes in Computer Science, vol 13160. Springer, Cham. https://doi.org/10.1007/978-3-031-31476-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31476-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31475-9

  • Online ISBN: 978-3-031-31476-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics