Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Philipp Scholl; Philipp Scholl; Felix Dietrich; Clemens Otte; Steffen Udluft

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

Topics: Machine Learning; Uncertainty in AI

In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 142-151, 2022

Authors: Philipp Scholl ^{1

;

2} ; Felix Dietrich ³ ; Clemens Otte ² and Steffen Udluft ²

Affiliations: ¹ Department of Mathematics, Ludwig-Maximilian-University of Munich, Munich, Germany ; ² Learning Systems, Siemens Technology, Munich, Germany ; ³ Department of Informatics, Technical University of Munich, Munich, Germany

Keyword(s): Risk-sensitive Reinforcement Learning, Safe Policy Improvement, Markov Decision Processes.

Abstract: Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.118.114.72

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Scholl, P., Dietrich, F., Otte, C. and Udluft, S. (2022). Safe Policy Improvement Approaches on Discrete Markov Decision Processes. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-547-0; ISSN 2184-433X, SciTePress, pages 142-151. DOI: 10.5220/0010786600003116

@conference{icaart22,
author={Philipp Scholl and Felix Dietrich and Clemens Otte and Steffen Udluft},
title={Safe Policy Improvement Approaches on Discrete Markov Decision Processes},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2022},
pages={142-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010786600003116},
isbn={978-989-758-547-0},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Safe Policy Improvement Approaches on Discrete Markov Decision Processes
SN - 978-989-758-547-0
IS - 2184-433X
AU - Scholl, P.
AU - Dietrich, F.
AU - Otte, C.
AU - Udluft, S.
PY - 2022
SP - 142
EP - 151
DO - 10.5220/0010786600003116
PB - SciTePress