ABSTRACT
Hand Crafting Reward functions have never been scalable solutions for real world problems. The self-generated intrinsic rewards inspired by human curiosity may be one of scalable answers to solve sparse reward problem. The research thus investigated the effectiveness of some selected techniques based on the theory of curiosity-driven exploration. The Count-based, Prediction-based and other methods in total of six algorithms were experimented on various OpenAI gym environments. The results showed that the exploration algorithms have an impact on software agent in ability to find optimal solutions compared with the baseline in many cases. Still, there is no clear winner between the selected exploration methods and the best scalable exploration is not yet explored. The finding is that the added small intrinsic reward noise helps improve sample efficiency in the short run.
- A. Aubret, L. Matignon, and S. Hassas, "A survey on intrinsic motivation in reinforcement learning," arXiv preprint arXiv:1908.06976, 2019.Google Scholar
- B. Baker et al., "Emergent tool use from multi-agent autocurricula," arXiv preprint arXiv:1909.07528, 2019.Google Scholar
- G. Brockman et al., "Openai gym," arXiv preprint arXiv:1606.01540, 2016.Google Scholar
- Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, "Large-scale study of curiosity-driven learning," arXiv preprint arXiv:1808.04355, 2018.Google Scholar
- Y. Burda, H. Edwards, A. Storkey, and O. Klimov, "Exploration by random network distillation," arXiv preprint arXiv:1810.12894, 2018.Google Scholar
- M. Cholodovskis Machado, "Efficient Exploration in Reinforcement Learning through Time-Based Representations," 2019.Google Scholar
- T. G. Dietterich, "The MAXQ Method for Hierarchical Reinforcement Learning.," in ICML, 1998, vol. 98, pp. 118--126.Google Scholar
- R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel, "Vime: Variational information maximizing exploration," in Advances in Neural Information Processing Systems, 2016, pp. 1109--1117. [2] F.N.M Surname, Article Title, https://www.acm.org/publications/proceedings-template.Google Scholar
- S. Iqbal and F. Sha, "Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning," arXiv preprint arXiv:1905.12127, 2019.Google Scholar
- M. Jaderberg et al., "Reinforcement learning with unsupervised auxiliary tasks," arXiv preprint arXiv:1611.05397, 2016.Google Scholar
- OpenAI, "Gym: A toolkit for developing and comparing reinforcement learning algorithms." [Online]. Available: https://gym.openai.com. [Accessed: 13-Oct-2019].Google Scholar
- D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, "Curiosity-driven exploration by self-supervised prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 16--17.Google Scholar
- H. Tang et al., "# Exploration: A study of count-based exploration for deep reinforcement learning," in Advances in neural information processing systems, 2017, pp. 2753--2762.Google Scholar
- S. B. Thrun, "Efficient exploration in reinforcement learning," 1992.Google Scholar
Index Terms
- Curiosity-Driven Exploration Effectiveness on Various Environments
Recommendations
Sampling diversity driven exploration with state difference guidance
AbstractExploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot ...
Highlights- We design a novel intrinsic reward for exploration.
- The intrinsic reward has ...
Exploration via Progress-Driven Intrinsic Rewards
Artificial Neural Networks and Machine Learning – ICANN 2020AbstractTraditional exploration methods in reinforcement learning rely on well-designed extrinsic rewards. However, many real-world scenarios involve sparse or delayed rewards. One solution inspired by curious behaviors in animals is to let the agent ...
Taking complementary advantages: Improving exploration via double self-imitation learning in procedurally-generated environments▪
AbstractEfficient exploration is the core issue of deep reinforcement learning. Although state-of-the-art exploration methods have achieved much progress in many tasks, they usually underperform in procedurally-generated environments, indicating the low ...
Highlights- Proposing a self-imitation exploration method by combining GAIL with BC.
- Defining a novel intrinsic reward based on the past good experiences via GAIL.
- Showing ability to be applied in continuous control tasks.
- Achieving a more ...
Comments