Abstract
Research in the Deep Reinforcement Learning (DRL) field has made great use of video game environments for benchmarking performance over the last few years. Most researches advocate for learning through high-dimensional sensory inputs (i.e. images) as observations in order to simulate scenarios that more closely approach reality. Though, when using these video game environments, the common practice is to provide the agent a reward signal calculated through accessing the environment’s internal state. However, this type of resource is hardly available when applying DRL to real-world problems. Thus, we propose a reward function that uses only the images received as observations. The proposal is evaluated in the Sonic the Hedgehog game. We analyzed the agent’s learning capabilities of the proposed reward function and, in most cases, its performance is similar to that obtained when accessing the environment’s internal state.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this work, we adopted Lowe’s ratio test 0.7.
- 2.
References
Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Inc. (2008)
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chiang, I., Huang, C.M., Cheng, N.H., Liu, H.Y., Tsai, S.C., et al.: Efficient exploration in side-scrolling video games with trajectory replay. Comput. Games J. 9(3), 263–280 (2020)
Dewey, D.: Reinforcement learning and the reward engineering principle. In: 2014 AAAI Spring Symposium Series (2014)
Fayjie, A.R., Hossain, S., Oualid, D., Lee, D.J.: Driverless car: autonomous driving using deep reinforcement learning in urban environment. In: 2018 15th International Conference on Ubiquitous Robots (UR), pp. 896–901. IEEE (2018)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Laud, A.D.: Theory and Application of Reward Shaping in Reinforcement Learning. University of Illinois at Urbana-Champaign (2004)
Li, Y.: Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274 (2017)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Lowe, G.: Sift-the scale invariant feature transform. Int. J. 2(91–110), 2 (2004)
McKnight, P.E., Najab, J.: Mann-Whitney U test. Corsini Encycl. Psychol. pp. 1–1 (2010)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331–340), 2 (2009)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Nichol, A., Pfau, V., Hesse, C., Klimov, O., Schulman, J.: Gotta learn fast: a new benchmark for generalization in RL. arXiv preprint arXiv:1804.03720 (2018)
Paszke, A., et al.: Automatic differentiation in pytorch. In: Proceedings of the Conference on Neural Information Processing Systems (NIPS) (2017)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shao, K., Tang, Z., Zhu, Y., Li, N., Zhao, D.: A survey of deep reinforcement learning in video games. arXiv preprint arXiv:1912.10944 (2019)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Acknowledgements
The authors thank the financial support provided by CNPq (316801/2021-6), Capes, FAPEMIG (APQ-01832-22), and UFJF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
de Souza, F.R., Miranda, T.S., Bernardino, H.S. (2022). A Reward Function Using Image Processing for a Deep Reinforcement Learning Approach Applied to the Sonic the Hedgehog Game. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-21689-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)