Abstract:
One of the serious problems in Reinforcement Learning (RL) algorithms is that their performance usually varies when the same experiment is repeated or reproduced. Althoug...Show MoreMetadata
Abstract:
One of the serious problems in Reinforcement Learning (RL) algorithms is that their performance usually varies when the same experiment is repeated or reproduced. Although RL results are hard to reproduce due to algorithms' intrinsic variance, which was not investigated systematically. Through this case study on Flappy Bird environment, we introduce and characterize four important factors on performance inconsistency of RL algorithms: 1) level of environment randomness, 2) order of action-value updates process, 3) exploration rate strategy, and 4) selection between on- and off-policy algorithms. Using a quantitative metric (coefficient of variation), we compare, analyze and investigate the results and the effects of each factor on the performance inconsistency/variance in RL. We believe our experimental results and analysis will provide opportunities to obtain an efficient agent that repeats/reproduces more consistent performance results.
Published in: 2021 International Conference on Information and Communication Technology Convergence (ICTC)
Date of Conference: 20-22 October 2021
Date Added to IEEE Xplore: 07 December 2021
ISBN Information:
Print on Demand(PoD) ISSN: 2162-1233