Skip to main content
Log in

Enhancing Deep Reinforcement Learning with Scenario-Based Modeling

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Deep reinforcement learning agents have achieved unprecedented results when learning to generalize from unstructured data. However, the “black-box” nature of the trained DRL agents makes it difficult to ensure that they adhere to various requirements posed by engineers. In this work, we put forth a novel technique for enhancing the reinforcement learning training loop, and specifically—its reward function, in a way that allows engineers to directly inject their expert knowledge into the training process. This allows us to make the trained agent adhere to multiple constraints of interest. Moreover, using scenario-based modeling techniques, our method allows users to formulate the defined constraints using advanced, well-established, behavioral modeling methods. This combination of such modeling methods together with ML learning tools produces agents that are both high performing and more likely to adhere to prescribed constraints. Furthermore, the resulting agents are more transparent and hence more maintainable. We demonstrate our technique by evaluating it on a case study from the domain of internet congestion control, and present promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ye D, Liu Z, Sun M, Shi B, Zhao P, Wu H, Yu H, Yang S, Wu X, Guo Q, Chen Q, Yin Y, Zhang H, Shi T, Wang L, Fu Q, Yang W, Huang L. Mastering complex control in MOBA games with deep reinforcement learning. In: Proc. 34th AAAI conf. on artificial intelligence (AAAI); 2020. p. 6672–9.

  2. Kiran B, Sobh I, Talpaert V, Mannion P, Sallab A, Yogamani S, Perez P. Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst. 2021;1–18.

  3. Xia Z, Xue S, Wu J, Chen Y, Chen J, Wu L. Deep reinforcement learning for smart city communication networks. IEEE Trans Ind Inform. 2021;17(6):4188–96.

    Article  Google Scholar 

  4. Li J, Pang D, Zheng Y, Guan X, Le X. A flexible manufacturing assembly system with deep reinforcement learning. Control Eng Practice. 2022;118: 104957.

    Article  Google Scholar 

  5. Elyasaf A. Inform Softw Technol. Context-oriented behavioral programming. 2021;133: 106504.

    Google Scholar 

  6. Mohamad Suhaili S, Salim N, Jambli M. Service chatbots: a systematic review. Exp Syst Appl. 2021;184: 115461.

    Article  Google Scholar 

  7. Eliyahu T, Kazak Y, Katz G, Schapira M. Verifying learning-augmented systems. In: Proc. conf. of the ACM special interest group on data communication on the applications, technologies, architectures, and protocols for computer communication (SIGCOMM); 2021. p. 305–18.

  8. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. Technical Report. 2013. Preprint at arXiv:1312.6199

  9. Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.

    MATH  Google Scholar 

  10. Ribeiro M, Singh S, Guestrin C. Why should I trust you?: Explaining the predictions of any classifier. In: Proc. 22nd ACM SIGKDD int. conf. on knowledge discovery and data mining; 2016. p. 1135–44.

  11. Samek W, Wiegand T, Müller K. Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. ITU J: Impact Artif Intell (AI) Commun Netw Serv. 2018;1(1):39–48.

    Google Scholar 

  12. Kazak Y, Barrett C, Katz G, Schapira M. Verifying Deep-RL-Driven Systems. In: Proc. 1st ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI); 2019. p. 83–89.

  13. Sutton R, Barto A. Introduction to reinforcement learning. Cambridge: MIT Press; 2018.

    MATH  Google Scholar 

  14. Damm W, Harel D. LSCs: breathing life into message sequence charts. J Form Methods Syst Des (FMSD). 2001;19(1):45–80.

    Article  MATH  Google Scholar 

  15. Harel D, Marron A, Weiss G. Behavioral programming. Commun ACM (CACM). 2012;55(7):90–100.

    Article  Google Scholar 

  16. Harel D, Katz G. Scaling-up behavioral programming: steps from basic principles to application architectures. In: Proc. 4th SPLASH workshop on programming based on actors, agents and decentralized control (AGERE!); 2014. p. 95–108.

  17. Katz G, Barrett C, Harel D. Theory-aided model checking of concurrent transition systems. In: Proc. 15th int. conf. on formal methods in computer-aided design (FMCAD); 2015. p. 81–8.

  18. Harel D, Lampert R, Marron A, Weiss G. Model-checking behavioral programs. In: Proc. 9th ACM int. conf. on embedded software (EMSOFT); 2011. p. 279–88.

  19. Harel D, Kugler H, Weiss G. Some methodological observations resulting from experience using LSCs and the play-in/play-out approach. In: Scenarios: models. Transformations and tools. Berlin: Springer; 2005. p. 26–42.

  20. Greenyer J, Gritzner D, Katz G, Marron A. Scenario-based modeling and synthesis for reactive systems with dynamic system structure in scenario tools. In: Proc. 19th ACM/IEEE int. conf. on model driven engineering languages and systems (MODELS); 2016. p. 16–23.

  21. Kugler H, Marelly R, Appleby L, Fisher J, Pnueli A, Harel D, Stern M, Hubbard J, et al. A scenario-based approach to modeling development: a prototype model of C. Elegans vulval fate specification. Dev Biol. 2008;323(1):1–5.

    Article  Google Scholar 

  22. Jay N, Rotman N, Godfrey B, Schapira M, Tamar A. A deep reinforcement learning perspective on internet congestion control. In: Proc. 36th int. conf. on machine learning (ICML); 2019. p. 3050–9.

  23. Yerushalmi R, Amir G, Elyasaf A, Harel D, Katz G, Marron A. Scenario-assisted deep reinforcement learning. In: Proc. 10th int. conf. on model-driven engineering and software development (MODELSWARD); 2022. p. 310–9.

  24. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI Gym. Technical Report. 2016. Preprint at arXiv:1606.01540

  25. Harel D, Marelly R. Come, let’s play: scenario-based programming using LSCs and the play-engine. Berlin: Springer; 2003. https://doi.org/10.1007/978-3-642-19029-2.

    Book  Google Scholar 

  26. Harel D, Katz G, Marron A, Weiss G. Non-intrusive repair of reactive programs. In: Proc. 17th IEEE int. conf. on engineering of complex computer systems (ICECCS); 2012. p. 3–12.

  27. Harel D, Kugler H, Marelly R, Pnueli A. Smart play-out of behavioral requirements. In: Proc. 4th int. conf. on formal methods in computer-aided design (FMCAD); 2002. p. 378–98.

  28. Elyasaf A, Weinstock M, Weiss G. Chapter 1. Interweaving AI and Behavioral Programming Towards Better Programming Environments, pp. 3–27. https://doi.org/10.1142/9789811239922_0001

  29. Harel D, Katz G, Marron A, Sadon A, Weiss G. Executing scenario-based specification with dynamic generation of rich events. Commun Comput Inf Sci (CCIS) 2020;1161.

  30. Katz G, Marron A, Sadon A, Weiss G. On-the-fly construction of composite events in scenario-based modeling using constraint solvers. In: Proc. 7th int. conf. on model-driven engineering and software development (MODELSWARD); 2019. p. 143–56.

  31. Harel D, Kantor A, Katz G, Marron A, Mizrahi L, Weiss G. On composing and proving the correctness of reactive behavior. In: Proc. 13th int. conf. on embedded software (EMSOFT); 2013. p. 1–10.

  32. Harel D, Katz G, Marron A, Weiss G. The effect of concurrent programming idioms on verification. In: Proc. 3rd int. conf. on model-driven engineering and software development (MODELSWARD); 2015. p. 363–9.

  33. Katz G. On module-based abstraction and repair of behavioral programs. In: Proc. 19th int. conf. on logic for programming, artificial intelligence and reasoning (LPAR); 2013. p. 518–35.

  34. Harel D, Katz G, Lampert R, Marron A, Weiss G. On the succinctness of idioms for concurrent programming. In: Proc. 26th int. conf. on concurrency theory (CONCUR); 2015. p. 85–99.

  35. Harel D, Kantor A, Katz G, Marron A, Weiss G, Wiener G. Towards behavioral programming in distributed architectures. J Sci Comput Programm (J SCP). 2015;98:233–67.

    Article  Google Scholar 

  36. Steinberg S, Greenyer J, Gritzner D, Harel D, Katz G, Marron A. Efficient distributed execution of multi-component scenario-based models. Commun Comput Inf Sci (CCIS). 2018;880:449–83.

    Google Scholar 

  37. Steinberg S, Greenyer J, Gritzner D, Harel D, Katz G, Marron A. Distributing scenario-based models: a replicate-and-project approach. In: Proc. 5th int. conf. on model-driven engineering and software development (MODELSWARD); 2017. p. 182–95.

  38. Greenyer J, Gritzner D, Katz G, Marron A, Glade N, Gutjahr T, König F. Distributed execution of scenario-based specifications of structurally dynamic cyber-physical systems. In: Proc. 3rd int. conf. on system-integrated intelligence: new challenges for product and production engineering (SYSINT); 2016. p. 552–9.

  39. Harel D, Kantor A, Katz G. Relaxing synchronization constraints in behavioral programs. In: Proc. 19th int. conf. on logic for programming, artificial intelligence and reasoning (LPAR); 2013. p. 355–72.

  40. Harel D, Katz G, Marron A, Weiss G. Non-intrusive repair of safety and liveness violations in reactive programs. Trans Comput Collect Intell (TCCI). 2014;16:1–33.

    Google Scholar 

  41. Katz G. Towards repairing scenario-based models with rich events. In: Proc. 9th int. conf. on model-driven engineering and software development (MODELSWARD); 2021. p. 362–72.

  42. Harel D, Katz G, Marelly R, Marron A. Wise computing: toward endowing system development with proactive wisdom. IEEE Comput. 2018;51(2):14–26.

    Article  Google Scholar 

  43. Marron A, Arnon B, Elyasaf A, Gordon M, Katz G, Lapid H, Marelly R, Sherman D, Szekely S, Weiss G, Harel D. Six (im)possible things before breakfast: building-blocks and design-principles for wise computing. In: Proc. 19th ACM/IEEE int. conf. on model driven engineering languages and systems (MODELS); 2016. p. 94–100.

  44. Harel D, Katz G, Marelly R, Marron A. An initial wise development environment for behavioral models. In: Proc. 4th int. conf. on model-driven engineering and software development (MODELSWARD); 2016. p. 600–12.

  45. Harel D, Katz G, Marelly R, Marron A. First steps towards a wise development environment for behavioral models. Int J Inform Syst Model Des (IJISMD). 2016;7(3):1–22.

    Article  Google Scholar 

  46. Gordon M, Marron A, Meerbaum-Salant O. Spaghetti for the main course? Observations on the naturalness of scenario-based programming. In: Proc. 17th ACM annual conf. on innovation and technology in computer science education (ITCSE); 2012. p. 198–203.

  47. Alexandron G, Armoni M, Gordon M, Harel D. Scenario-based programming: reducing the cognitive load, fostering abstract thinking. In: Proc 36th int. conf. on software engineering (ICSE); 2014. p. 311–20.

  48. Katz G. Guarded deep learning using scenario-based modeling. In: Proc. 8th int. conf. on model-driven engineering and software development (MODELSWARD); 2020. p. 126–36.

  49. Katz G, Elyasaf A. Towards combining deep learning, verification, and scenario-based programming. In: Proc. 1st workshop on verification of autonomous and robotic systems (VARS); 2021. p. 1–3.

  50. Ng A, Harada D, Russell S. Policy invariance under reward transformations: theory and application to reward shaping. In: Proc. 16th int. conf. on machine learning (ICML); 1999. p. 278–87.

  51. Zou H, Ren T, Yan D, Su H, Zhu J. Reward shaping via meta-learning. Technical Report. 2019. Preprint at arXiv:1901.09330

  52. Yaacov T. BPPy: behavioral programming in Python. 2020. https://github.com/bThink-BGU/BPPy

  53. Harel D, Marron A, Weiss G. Programming coordinated scenarios in Java. In: Proc. 24th European conf. on object-oriented programming (ECOOP); 2010. p. 250–74.

  54. Shalev-Shwartz S, Shammah S, Shashua A. On a formal model of safe and scalable self-driving cars. Technical Report. 2017. Preprint at arXiv:1708.06374

  55. Kang C, Kim G, Yoo S-I. Detection and recognition of text embedded in online images via neural context models. In: Proc. 31st AAAI conf. on artificial intelligence (AAAI); 2017.

  56. Milan A, Rezatofighi H, Dick A, Reid I, Schindler K. Online multi-target tracking using recurrent neural networks. In: Proc. 31st AAAI conf. on artificial intelligence (AAAI); 2017.

  57. Ray P, Chakrabarti A. A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl Comput Inform. 2020.

  58. Katz G. Augmenting deep neural networks with scenario-based guard rules. Commun Comput Inf Sci (CCIS). 2021;1361:147–72.

    Google Scholar 

  59. Elyasaf A, Sadon A, Weiss G, Yaacov T. Using behavioural programming with solver, context, and deep reinforcement learning for playing a simplified RoboCup-Type game. In: Proc. 22nd ACM/IEEE int. conf. on model driven engineering languages and systems companion (MODELS-C); 2019. p. 243–51.

Download references

Funding

The work of R. Yerushalmi, G. Amir, A. Elyasaf and G. Katz was partially supported by a grant from the Israeli Smart Transportation Research Center (ISTRC). The work of G. Amir was supported by a scholarship from the Clore Israel Foundation. The work of D. Harel, A. Marron and R. Yerushalmi was partially supported by a research grant from the Estate of Harry Levine, the Estate of Avraham Rothstein, Brenda Gruss and Daniel Hirsch, the One8 Foundation, Rina Mayer, Maurice Levy, and the Estate of Bernice Bernath, a grant 3698/21 from the ISF-NSFC joint to the Israel Science Foundation and the National Science Foundation of China, and a grant from the Minerva foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raz Yerushalmi.

Ethics declarations

Conflict of interest

The authors of this work declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Model-Driven Engineering and Software Development” guest edited by Luís Ferreira Pires and Slimane Hammoudi.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yerushalmi, R., Amir, G., Elyasaf, A. et al. Enhancing Deep Reinforcement Learning with Scenario-Based Modeling. SN COMPUT. SCI. 4, 156 (2023). https://doi.org/10.1007/s42979-022-01575-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01575-2

Keywords

Navigation