In this work, we propose the use of symbolic automata as formal specifications for reinforcement learning agents. The use of symbolic automata serves as a generalization of both bounded-time temporal logic-based specifications and deterministic finite automata, allowing us to describe input alphabets over metric spaces. Furthermore, our use of symbolic automata allows us to define non-sparse potential-based rewards which empirically shape the reward surface, leading to better convergence during RL. We also show that our potential-based rewarding strategy still allows us to obtain the policy that maximizes the satisfaction of the given specification.

References

[1]

Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, and Calin Belta. 2016. Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications. In 2016 IEEE 55th Conference on Decision and Control (CDC). 6565–6570. https://doi.org/10.1109/cdc.2016.7799279

Digital Library

Google Scholar

[2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv:1606.06565 [cs] (July 2016). arxiv:1606.06565 [cs] http://arxiv.org/abs/1606.06565

Google Scholar

[3]

Anand Balakrishnan and Jyotirmoy V Deshmukh. 2019. Structured reward shaping using signal temporal logic specifications. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3481–3486. https://doi.org/10.1109/IROS40897.2019.8968254

Digital Library

Google Scholar

[4]

Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. 1983. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Transactions on Systems, Man, and Cybernetics SMC-13, 5 (Sept. 1983), 834–846. https://doi.org/10.1109/TSMC.1983.6313077

Crossref

Google Scholar

[5]

Loris D’Antoni and Margus Veanes. 2017. The Power of Symbolic Automata and Transducers. In Computer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Vol. 10426. Springer International Publishing, Cham, 47–67. https://doi.org/10.1007/978-3-319-63387-9_3

Crossref

Google Scholar

[6]

E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. 2020. Reward Shaping for Reinforcement Learning with Omega- Regular Objectives. arXiv:2001.05977 [cs] (Jan. 2020). arxiv:2001.05977 [cs] http://arxiv.org/abs/2001.05977

Google Scholar

[7]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Constrained Reinforcement Learning. arXiv:1801.08099 [cs] (Jan. 2018). arxiv:1801.08099 [cs] http://arxiv.org/abs/1801.08099

Google Scholar

[8]

Abolfazl Lavaei, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, and Majid Zamani. 2020. Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning. arXiv:2003.00712 [cs, eess] (March 2020), 98–107. https://doi.org/10.1109/ICCPS48487.2020.00017 arxiv:2003.00712 [cs, eess]

Crossref

Google Scholar

[9]

Dorsa Sadigh, Eric S. Kim, Samuel Coogan, S. Shankar Sastry, and Sanjit A. Seshia. 2014. A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications. In 53rd IEEE Conference on Decision and Control. 1091–1096. https://doi.org/10.1109/cdc.2014.7039527

Crossref

Google Scholar

[10]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction(second edition ed.). MIT press, Cambridge, Massachusetts.

Google Scholar

[11]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-Learning. Machine Learning 8, 3 (May 1992), 279–292. https://doi.org/10.1007/BF00992698

Digital Library

Google Scholar

Cited By

View all

Miller KZeitler CShen WHobbs KSchierman JViswanathan MMitra S(2024)Optimal Runtime Assurance via Reinforcement Learning2024 ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)10.1109/ICCPS61052.2024.00013(67-76)Online publication date: 13-May-2024
https://doi.org/10.1109/ICCPS61052.2024.00013

Recommendations

Omega-Regular Objectives in Model-Free Reinforcement Learning
Tools and Algorithms for the Construction and Analysis of Systems
Abstract
We provide the first solution for model-free reinforcement learning of -regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of -regular objectives to an almost-sure ...
Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Recent success in reinforcement learning (RL) has brought renewed attention to the design of reward functions by which agent behavior is reinforced or deterred. Manually designing reward functions is tedious and error-prone. An alternative approach is ...
Limited Automata and Context-Free Languages
Non-Classical Models of Automata and Applications V

Limited automata are one-tape Turing machines which are allowed to rewrite each tape cell only in the first d visits, for a given constant d. For each d ≥ 2, these devices characterize the class of context-free languages. We investigate the equivalence ...

Comments

Information & Contributors

Information

Published In

HSCC '22: Proceedings of the 25th ACM International Conference on Hybrid Systems: Computation and Control

May 2022

265 pages

ISBN:9781450391962

DOI:10.1145/3501710

Co-chairs:
Ezio Bartocci
Technische Universität Wien, Vienna, Austria
,
Sylvie Putot
Ecole Polytechnique, Palaiseau, France

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2022

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

HSCC '22

Sponsor:

SIGBED

HSCC '22: 25th ACM International Conference on Hybrid Systems: Computation and Control

May 4 - 6, 2022

Milan, Italy

Acceptance Rates

Overall Acceptance Rate 153 of 373 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
458
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)17

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Miller KZeitler CShen WHobbs KSchierman JViswanathan MMitra S(2024)Optimal Runtime Assurance via Reinforcement Learning2024 ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)10.1109/ICCPS61052.2024.00013(67-76)Online publication date: 13-May-2024
https://doi.org/10.1109/ICCPS61052.2024.00013

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Omega-Regular Objectives in Model-Free Reinforcement Learning

Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning

Limited Automata and Context-Free Languages

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF

eReader

HTML Format

Login options

Full Access

Abstract

References

Cited By

Recommendations

Omega-Regular Objectives in Model-Free Reinforcement Learning

Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning

Limited Automata and Context-Free Languages

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations