Abstract:
Given a stochastic policy learned by reinforcement, we wish to ensure that it can be deployed on a robot with demonstrably low probability of unsafe behavior. Our case st...Show MoreMetadata
Abstract:
Given a stochastic policy learned by reinforcement, we wish to ensure that it can be deployed on a robot with demonstrably low probability of unsafe behavior. Our case study is about learning to reach target objects positioned close to obstacles, and ensuring a reasonably low collision probability. Learning is carried out in a simulator to avoid physical damage in the trial-and-error phase. Once a policy is learned, we analyze it with probabilistic model checking tools to identify and correct potential unsafe behaviors. The whole process is automated and, in principle, it can be integrated step-by-step with routine task-learning. As our results demonstrate, automated fixing of policies is both feasible and highly effective in bounding the probability of unsafe behaviors.
Date of Conference: 03-07 November 2013
Date Added to IEEE Xplore: 02 January 2014
ISBN Information: