A One-shot Convex Optimization Approach to Risk-Averse Q-Learning | IEEE Conference Publication | IEEE Xplore

A One-shot Convex Optimization Approach to Risk-Averse Q-Learning


Abstract:

This paper presents a model-free Q-learning algorithm for solving the risk-averse optimal control (RAOC) problem. The entropic risk measure is used in the RAOC to account...Show More

Abstract:

This paper presents a model-free Q-learning algorithm for solving the risk-averse optimal control (RAOC) problem. The entropic risk measure is used in the RAOC to account for the variance of the objective function. A one-shot Q-based convex optimization problem is then formed for which the decision variables are the Q-function parameters and the constraints are formed by sampling from an exponential utility-based entropic Bellman inequality. Samples are constructed using only a batch of data collected from a variety of control policies in a fully off-policy manner, which turns a dataset into a Q-learning based risk-averse optimal policy engine. Convergence of the exact optimization problem, which is infinite- dimensional in decision variables and constraints, to the optimal risk-averse Q-function is shown. For the standard convex optimization problem for which function approximation for Q-value estimations as well as constraint sampling are leveraged, the performance of the approximated solutions is verified through a weighted-norm bound and the Lyapunov bound. A simulation example is provided to verify the effectiveness of the presented approach.
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 01 February 2022
ISBN Information:

ISSN Information:

Conference Location: Austin, TX, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.