Enhancing Deep Reinforcement Learning with Scenario-Based Modeling

Yerushalmi, Raz; Amir, Guy; Elyasaf, Achiya; Harel, David; Katz, Guy; Marron, Assaf

doi:10.1007/s42979-022-01575-2

Enhancing Deep Reinforcement Learning with Scenario-Based Modeling

Original Research
Published: 11 January 2023

Volume 4, article number 156, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Raz Yerushalmi ORCID: orcid.org/0000-0002-0513-3211^1,2,
Guy Amir²,
Achiya Elyasaf³,
David Harel¹,
Guy Katz² &
…
Assaf Marron¹

199 Accesses
2 Citations
Explore all metrics

Abstract

Deep reinforcement learning agents have achieved unprecedented results when learning to generalize from unstructured data. However, the “black-box” nature of the trained DRL agents makes it difficult to ensure that they adhere to various requirements posed by engineers. In this work, we put forth a novel technique for enhancing the reinforcement learning training loop, and specifically—its reward function, in a way that allows engineers to directly inject their expert knowledge into the training process. This allows us to make the trained agent adhere to multiple constraints of interest. Moreover, using scenario-based modeling techniques, our method allows users to formulate the defined constraints using advanced, well-established, behavioral modeling methods. This combination of such modeling methods together with ML learning tools produces agents that are both high performing and more likely to adhere to prescribed constraints. Furthermore, the resulting agents are more transparent and hence more maintainable. We demonstrate our technique by evaluating it on a case study from the domain of internet congestion control, and present promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and deep learning

Article Open access 08 April 2021

Christian Janiesch, Patrick Zschech & Kai Heinrich

Human-in-the-loop machine learning: a state of the art

Article Open access 17 August 2022

Eduardo Mosqueira-Rey, Elena Hernández-Pereira, … Ángel Fernández-Leal

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

References

Ye D, Liu Z, Sun M, Shi B, Zhao P, Wu H, Yu H, Yang S, Wu X, Guo Q, Chen Q, Yin Y, Zhang H, Shi T, Wang L, Fu Q, Yang W, Huang L. Mastering complex control in MOBA games with deep reinforcement learning. In: Proc. 34th AAAI conf. on artificial intelligence (AAAI); 2020. p. 6672–9.
Kiran B, Sobh I, Talpaert V, Mannion P, Sallab A, Yogamani S, Perez P. Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst. 2021;1–18.
Xia Z, Xue S, Wu J, Chen Y, Chen J, Wu L. Deep reinforcement learning for smart city communication networks. IEEE Trans Ind Inform. 2021;17(6):4188–96.
Article Google Scholar
Li J, Pang D, Zheng Y, Guan X, Le X. A flexible manufacturing assembly system with deep reinforcement learning. Control Eng Practice. 2022;118: 104957.
Article Google Scholar
Elyasaf A. Inform Softw Technol. Context-oriented behavioral programming. 2021;133: 106504.
Google Scholar
Mohamad Suhaili S, Salim N, Jambli M. Service chatbots: a systematic review. Exp Syst Appl. 2021;184: 115461.
Article Google Scholar
Eliyahu T, Kazak Y, Katz G, Schapira M. Verifying learning-augmented systems. In: Proc. conf. of the ACM special interest group on data communication on the applications, technologies, architectures, and protocols for computer communication (SIGCOMM); 2021. p. 305–18.
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. Technical Report. 2013. Preprint at arXiv:1312.6199
Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.
MATH Google Scholar
Ribeiro M, Singh S, Guestrin C. Why should I trust you?: Explaining the predictions of any classifier. In: Proc. 22nd ACM SIGKDD int. conf. on knowledge discovery and data mining; 2016. p. 1135–44.
Samek W, Wiegand T, Müller K. Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. ITU J: Impact Artif Intell (AI) Commun Netw Serv. 2018;1(1):39–48.
Google Scholar
Kazak Y, Barrett C, Katz G, Schapira M. Verifying Deep-RL-Driven Systems. In: Proc. 1st ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI); 2019. p. 83–89.
Sutton R, Barto A. Introduction to reinforcement learning. Cambridge: MIT Press; 2018.
MATH Google Scholar
Damm W, Harel D. LSCs: breathing life into message sequence charts. J Form Methods Syst Des (FMSD). 2001;19(1):45–80.
Article MATH Google Scholar
Harel D, Marron A, Weiss G. Behavioral programming. Commun ACM (CACM). 2012;55(7):90–100.
Article Google Scholar
Harel D, Katz G. Scaling-up behavioral programming: steps from basic principles to application architectures. In: Proc. 4th SPLASH workshop on programming based on actors, agents and decentralized control (AGERE!); 2014. p. 95–108.
Katz G, Barrett C, Harel D. Theory-aided model checking of concurrent transition systems. In: Proc. 15th int. conf. on formal methods in computer-aided design (FMCAD); 2015. p. 81–8.
Harel D, Lampert R, Marron A, Weiss G. Model-checking behavioral programs. In: Proc. 9th ACM int. conf. on embedded software (EMSOFT); 2011. p. 279–88.
Harel D, Kugler H, Weiss G. Some methodological observations resulting from experience using LSCs and the play-in/play-out approach. In: Scenarios: models. Transformations and tools. Berlin: Springer; 2005. p. 26–42.
Greenyer J, Gritzner D, Katz G, Marron A. Scenario-based modeling and synthesis for reactive systems with dynamic system structure in scenario tools. In: Proc. 19th ACM/IEEE int. conf. on model driven engineering languages and systems (MODELS); 2016. p. 16–23.
Kugler H, Marelly R, Appleby L, Fisher J, Pnueli A, Harel D, Stern M, Hubbard J, et al. A scenario-based approach to modeling development: a prototype model of C. Elegans vulval fate specification. Dev Biol. 2008;323(1):1–5.
Article Google Scholar
Jay N, Rotman N, Godfrey B, Schapira M, Tamar A. A deep reinforcement learning perspective on internet congestion control. In: Proc. 36th int. conf. on machine learning (ICML); 2019. p. 3050–9.
Yerushalmi R, Amir G, Elyasaf A, Harel D, Katz G, Marron A. Scenario-assisted deep reinforcement learning. In: Proc. 10th int. conf. on model-driven engineering and software development (MODELSWARD); 2022. p. 310–9.
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. OpenAI Gym. Technical Report. 2016. Preprint at arXiv:1606.01540
Harel D, Marelly R. Come, let’s play: scenario-based programming using LSCs and the play-engine. Berlin: Springer; 2003. https://doi.org/10.1007/978-3-642-19029-2.
Book Google Scholar
Harel D, Katz G, Marron A, Weiss G. Non-intrusive repair of reactive programs. In: Proc. 17th IEEE int. conf. on engineering of complex computer systems (ICECCS); 2012. p. 3–12.
Harel D, Kugler H, Marelly R, Pnueli A. Smart play-out of behavioral requirements. In: Proc. 4th int. conf. on formal methods in computer-aided design (FMCAD); 2002. p. 378–98.
Elyasaf A, Weinstock M, Weiss G. Chapter 1. Interweaving AI and Behavioral Programming Towards Better Programming Environments, pp. 3–27. https://doi.org/10.1142/9789811239922_0001
Harel D, Katz G, Marron A, Sadon A, Weiss G. Executing scenario-based specification with dynamic generation of rich events. Commun Comput Inf Sci (CCIS) 2020;1161.
Katz G, Marron A, Sadon A, Weiss G. On-the-fly construction of composite events in scenario-based modeling using constraint solvers. In: Proc. 7th int. conf. on model-driven engineering and software development (MODELSWARD); 2019. p. 143–56.
Harel D, Kantor A, Katz G, Marron A, Mizrahi L, Weiss G. On composing and proving the correctness of reactive behavior. In: Proc. 13th int. conf. on embedded software (EMSOFT); 2013. p. 1–10.
Harel D, Katz G, Marron A, Weiss G. The effect of concurrent programming idioms on verification. In: Proc. 3rd int. conf. on model-driven engineering and software development (MODELSWARD); 2015. p. 363–9.
Katz G. On module-based abstraction and repair of behavioral programs. In: Proc. 19th int. conf. on logic for programming, artificial intelligence and reasoning (LPAR); 2013. p. 518–35.
Harel D, Katz G, Lampert R, Marron A, Weiss G. On the succinctness of idioms for concurrent programming. In: Proc. 26th int. conf. on concurrency theory (CONCUR); 2015. p. 85–99.
Harel D, Kantor A, Katz G, Marron A, Weiss G, Wiener G. Towards behavioral programming in distributed architectures. J Sci Comput Programm (J SCP). 2015;98:233–67.
Article Google Scholar
Steinberg S, Greenyer J, Gritzner D, Harel D, Katz G, Marron A. Efficient distributed execution of multi-component scenario-based models. Commun Comput Inf Sci (CCIS). 2018;880:449–83.
Google Scholar
Steinberg S, Greenyer J, Gritzner D, Harel D, Katz G, Marron A. Distributing scenario-based models: a replicate-and-project approach. In: Proc. 5th int. conf. on model-driven engineering and software development (MODELSWARD); 2017. p. 182–95.
Greenyer J, Gritzner D, Katz G, Marron A, Glade N, Gutjahr T, König F. Distributed execution of scenario-based specifications of structurally dynamic cyber-physical systems. In: Proc. 3rd int. conf. on system-integrated intelligence: new challenges for product and production engineering (SYSINT); 2016. p. 552–9.
Harel D, Kantor A, Katz G. Relaxing synchronization constraints in behavioral programs. In: Proc. 19th int. conf. on logic for programming, artificial intelligence and reasoning (LPAR); 2013. p. 355–72.
Harel D, Katz G, Marron A, Weiss G. Non-intrusive repair of safety and liveness violations in reactive programs. Trans Comput Collect Intell (TCCI). 2014;16:1–33.
Google Scholar
Katz G. Towards repairing scenario-based models with rich events. In: Proc. 9th int. conf. on model-driven engineering and software development (MODELSWARD); 2021. p. 362–72.
Harel D, Katz G, Marelly R, Marron A. Wise computing: toward endowing system development with proactive wisdom. IEEE Comput. 2018;51(2):14–26.
Article Google Scholar
Marron A, Arnon B, Elyasaf A, Gordon M, Katz G, Lapid H, Marelly R, Sherman D, Szekely S, Weiss G, Harel D. Six (im)possible things before breakfast: building-blocks and design-principles for wise computing. In: Proc. 19th ACM/IEEE int. conf. on model driven engineering languages and systems (MODELS); 2016. p. 94–100.
Harel D, Katz G, Marelly R, Marron A. An initial wise development environment for behavioral models. In: Proc. 4th int. conf. on model-driven engineering and software development (MODELSWARD); 2016. p. 600–12.
Harel D, Katz G, Marelly R, Marron A. First steps towards a wise development environment for behavioral models. Int J Inform Syst Model Des (IJISMD). 2016;7(3):1–22.
Article Google Scholar
Gordon M, Marron A, Meerbaum-Salant O. Spaghetti for the main course? Observations on the naturalness of scenario-based programming. In: Proc. 17th ACM annual conf. on innovation and technology in computer science education (ITCSE); 2012. p. 198–203.
Alexandron G, Armoni M, Gordon M, Harel D. Scenario-based programming: reducing the cognitive load, fostering abstract thinking. In: Proc 36th int. conf. on software engineering (ICSE); 2014. p. 311–20.
Katz G. Guarded deep learning using scenario-based modeling. In: Proc. 8th int. conf. on model-driven engineering and software development (MODELSWARD); 2020. p. 126–36.
Katz G, Elyasaf A. Towards combining deep learning, verification, and scenario-based programming. In: Proc. 1st workshop on verification of autonomous and robotic systems (VARS); 2021. p. 1–3.
Ng A, Harada D, Russell S. Policy invariance under reward transformations: theory and application to reward shaping. In: Proc. 16th int. conf. on machine learning (ICML); 1999. p. 278–87.
Zou H, Ren T, Yan D, Su H, Zhu J. Reward shaping via meta-learning. Technical Report. 2019. Preprint at arXiv:1901.09330
Yaacov T. BPPy: behavioral programming in Python. 2020. https://github.com/bThink-BGU/BPPy
Harel D, Marron A, Weiss G. Programming coordinated scenarios in Java. In: Proc. 24th European conf. on object-oriented programming (ECOOP); 2010. p. 250–74.
Shalev-Shwartz S, Shammah S, Shashua A. On a formal model of safe and scalable self-driving cars. Technical Report. 2017. Preprint at arXiv:1708.06374
Kang C, Kim G, Yoo S-I. Detection and recognition of text embedded in online images via neural context models. In: Proc. 31st AAAI conf. on artificial intelligence (AAAI); 2017.
Milan A, Rezatofighi H, Dick A, Reid I, Schindler K. Online multi-target tracking using recurrent neural networks. In: Proc. 31st AAAI conf. on artificial intelligence (AAAI); 2017.
Ray P, Chakrabarti A. A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl Comput Inform. 2020.
Katz G. Augmenting deep neural networks with scenario-based guard rules. Commun Comput Inf Sci (CCIS). 2021;1361:147–72.
Google Scholar
Elyasaf A, Sadon A, Weiss G, Yaacov T. Using behavioural programming with solver, context, and deep reinforcement learning for playing a simplified RoboCup-Type game. In: Proc. 22nd ACM/IEEE int. conf. on model driven engineering languages and systems companion (MODELS-C); 2019. p. 243–51.

Download references

Funding

The work of R. Yerushalmi, G. Amir, A. Elyasaf and G. Katz was partially supported by a grant from the Israeli Smart Transportation Research Center (ISTRC). The work of G. Amir was supported by a scholarship from the Clore Israel Foundation. The work of D. Harel, A. Marron and R. Yerushalmi was partially supported by a research grant from the Estate of Harry Levine, the Estate of Avraham Rothstein, Brenda Gruss and Daniel Hirsch, the One8 Foundation, Rina Mayer, Maurice Levy, and the Estate of Bernice Bernath, a grant 3698/21 from the ISF-NSFC joint to the Israel Science Foundation and the National Science Foundation of China, and a grant from the Minerva foundation.

Author information

Authors and Affiliations

Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, 76100, Rehovot, Israel
Raz Yerushalmi, David Harel & Assaf Marron
School of Computer Science and Engineering, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel
Raz Yerushalmi, Guy Amir & Guy Katz
Software and Information Systems Engineering, Ben-Gurion University of the Negev, 8410501, Beer-Sheva, Israel
Achiya Elyasaf

Authors

Raz Yerushalmi
View author publications
You can also search for this author in PubMed Google Scholar
Guy Amir
View author publications
You can also search for this author in PubMed Google Scholar
Achiya Elyasaf
View author publications
You can also search for this author in PubMed Google Scholar
David Harel
View author publications
You can also search for this author in PubMed Google Scholar
Guy Katz
View author publications
You can also search for this author in PubMed Google Scholar
Assaf Marron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raz Yerushalmi.

Ethics declarations

Conflict of interest

The authors of this work declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Model-Driven Engineering and Software Development” guest edited by Luís Ferreira Pires and Slimane Hammoudi.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yerushalmi, R., Amir, G., Elyasaf, A. et al. Enhancing Deep Reinforcement Learning with Scenario-Based Modeling. SN COMPUT. SCI. 4, 156 (2023). https://doi.org/10.1007/s42979-022-01575-2

Download citation

Received: 21 November 2022
Accepted: 16 December 2022
Published: 11 January 2023
DOI: https://doi.org/10.1007/s42979-022-01575-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing Deep Reinforcement Learning with Scenario-Based Modeling

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

Human-in-the-loop machine learning: a state of the art

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

Human-in-the-loop machine learning: a state of the art

Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation