Abstract:
This paper is a report of our extensive experimentation, during the last two years, of deep reinforcement techniques for training an agent to move in the dungeons of the ...Show MoreMetadata
Abstract:
This paper is a report of our extensive experimentation, during the last two years, of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video game. The challenging nature of the problem is tightly related to the procedural, random generation of new dungeon maps at each level, which forbids any form of level-specific learning and forces us to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, nonreactive behaviors involving memory and planning. In this paper, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor-Critic and Actor-Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed-as ACER, better than other algorithms, seems able to do-the problem of sparse rewards can be overcome without any need of intrinsic motivations.
Published in: IEEE Transactions on Games ( Volume: 12, Issue: 2, June 2020)