Abstract
Exploring novel environments for a specific target poses the challenge of how to adequately provide positive external rewards to an artificial agent. In scenarios with sparse external rewards, a reinforcement learning algorithm often cannot develop a successful policy function to govern an agent’s behavior. However, intrinsic rewards can provide feedback on an agent’s actions and enable updates towards a proper policy function in sparse scenarios. Our approaches called the Optical Flow-Augmented Curiosity Module (OF-ACM) and Depth-Augmented Curiosity Module (D-ACM) extend the Intrinsic Curiosity Model (ICM) by Pathak et al. The ICM forms an intrinsic reward signal from the error between a prediction and the ground truth of the next state. Shown with experiments in visually rich and sparse feature scenarios in ViZDoom, our predictive modules exhibit improved exploration capabilities and learning of an ideal policy function. Our modules leverage additional sources of information, such as depth images and optical flow, to generate superior embeddings that serve as inputs for next state prediction. With D-ACM we show a 63.3% average improvement in time to convergence of a policy over ICM in “My Way Home” scenarios.
J. Carvajal, T. Molnar and L. Burzawa—Equal Contribution.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016)
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45103-X_50
Grześ, M.: Reward shaping in episodic reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, pp. 565–573. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2017)
He, Y., Chen, S.: Advances in sensing and processing methods for three-dimensional robot vision. Int. J. Adv. Robot. Syst. 15(2) (2018). https://doi.org/10.1177/1729881418760623
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaskowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. CoRR abs/1605.02097 (2016)
Lu, F., Milios, E.: Globally consistent range scan alignment for environment mapping. Auton. Robots 4(4), 333–349 (1997)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. CoRR abs/1602.01783 (2016)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)
Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots. CoRR abs/1610.01733 (2016)
Wu, Y., Mansimov, E., Liao, S., Grosse, R.B., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. CoRR abs/1708.05144 (2017)
Zhang, M., Levine, S., McCarthy, Z., Finn, C., Abbeel, P.: Policy learning with continuous memory states for partially observed robotic control. CoRR abs/1507.01273 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Carvajal, J., Molnar, T., Burzawa, L., Culurciello, E. (2019). Augmented Curiosity: Depth and Optical Flow Prediction for Efficient Exploration. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-33720-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)