Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Philipp Scholl 1 ; 2 ; Felix Dietrich 3 ; Clemens Otte 2 and Steffen Udluft 2

Affiliations: 1 Department of Mathematics, Ludwig-Maximilian-University of Munich, Munich, Germany ; 2 Learning Systems, Siemens Technology, Munich, Germany ; 3 Department of Informatics, Technical University of Munich, Munich, Germany

Keyword(s): Risk-sensitive Reinforcement Learning, Safe Policy Improvement, Markov Decision Processes.

Abstract: Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.216.78.8

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Scholl, P., Dietrich, F., Otte, C. and Udluft, S. (2022). Safe Policy Improvement Approaches on Discrete Markov Decision Processes. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-547-0; ISSN 2184-433X, SciTePress, pages 142-151. DOI: 10.5220/0010786600003116

@conference{icaart22,
author={Philipp Scholl and Felix Dietrich and Clemens Otte and Steffen Udluft},
title={Safe Policy Improvement Approaches on Discrete Markov Decision Processes},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2022},
pages={142-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010786600003116},
isbn={978-989-758-547-0},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Safe Policy Improvement Approaches on Discrete Markov Decision Processes
SN - 978-989-758-547-0
IS - 2184-433X
AU - Scholl, P.
AU - Dietrich, F.
AU - Otte, C.
AU - Udluft, S.
PY - 2022
SP - 142
EP - 151
DO - 10.5220/0010786600003116
PB - SciTePress