Abstract:
Unsupervised pre-training in reinforcement learning enables the agent to gain prior environmental knowledge, which is then fine-tuned in the supervised stage to quickly a...Show MoreMetadata
Abstract:
Unsupervised pre-training in reinforcement learning enables the agent to gain prior environmental knowledge, which is then fine-tuned in the supervised stage to quickly adapt to various downstream tasks. In the absence of task-related rewards, pre-training aims to acquire policies (i.e., behaviors) that generate different trajectories to explore and master the environment. Previous research categorizes states into their associated behaviors by learning a supervised discriminator. However, an underlying problem persists: such discriminator is trained in lack of relevant data, leading to an underestimation of reward for new states and inadequate exploration. To this end, we introduce an unsupervised active pre-training algorithm for diverse behavior induction (APD). We explicitly characterize the behavior variables with a state-dependent sampling method, and the agent can decompose the entire state space into parts for fine-grained and diverse behavior learning. Specifically, a particle-based entropy estimator is applied to optimize a combination of behavioral entropy and mutual information objective. Moreover, we develop behavior-based representation learning to compress states into the latent space. Experiments show that our method can improve exploration efficiency and outperforms most state-of-the-art unsupervised algorithms on a number of continuous control tasks in the DeepMind Control Suite.
Published in: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 4, October 2022)