Impact Statement:PPO-ARC is a novel RL algorithm based on conservative policy iteration (CPI). Compared with traditional PPO, PPO-ARC updates policy according to the parallel competitive ...Show More
Abstract:
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, ...Show MoreMetadata
Impact Statement:
PPO-ARC is a novel RL algorithm based on conservative policy iteration (CPI). Compared with traditional PPO, PPO-ARC updates policy according to the parallel competitive optimization, and an evaluation of the advantage of previous policy is involved in this progress. Most RL algorithms concentrate on the process of interaction between policy and environment, without the usage of previous policy. PPO-ARC fully considers the above issues and introduces the off-policy advantage. Through the theoretical and experimental analysis, we demonstrate that the algorithm proposed in this article is able to improve performance through policy update and full utilization of sample efficiency. What’s more, PPO-ARC is extremely simple to use, thus can be extended to many projects that use PPO in reality.
Abstract:
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized cli...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 8, August 2024)