Abstract:
The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-opti...Show MoreMetadata
Abstract:
The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-optimality policies across tasks suffers from prior knowledge deficiency in discrete-time nonlinear environment, especially for continuous task variations, requiring scalability approaches to transfer prior knowledge among new tasks when considering large number of tasks. This paper proposes a multitask policy adversarial learning (MTPAL) method for learning a nonlinear feedback policy that generalizes across multiple tasks, making cognizance ability of robot much closer to human-level decision making. The key idea is to construct a parametrized policy model directly from large high-dimensional observations by deep function approximators, and then train optimal of sequential decision policy for each new task by an adversarial process, in which simultaneously two models are trained: a multitask policy generator transforms samples drawn from a prior distribution into samples from a complex data distribution with higher dimensionality, and a multitask policy discriminator decides whether the given sample is prior distribution from human-level empirically derived or from the generator. All the related human-level empirically derived are integrated into the sequential decision policy, transferring human-level policy at every layer in a deep policy network. Extensive experimental testing result of four different WeiChai Power manufacturing data sets shows that our approach can surpass human performance simultaneously from cart-pole to production assembly control.
Published in: IEEE Transactions on Industrial Informatics ( Volume: 15, Issue: 4, April 2019)