A Reward Function Using Image Processing for a Deep Reinforcement Learning Approach Applied to the Sonic the Hedgehog Game

de Souza, Felipe Rafael; Miranda, Thiago Silva; Bernardino, Heder Soares

doi:10.1007/978-3-031-21689-3_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13654 ))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

862 Accesses

Abstract

Research in the Deep Reinforcement Learning (DRL) field has made great use of video game environments for benchmarking performance over the last few years. Most researches advocate for learning through high-dimensional sensory inputs (i.e. images) as observations in order to simulate scenarios that more closely approach reality. Though, when using these video game environments, the common practice is to provide the agent a reward signal calculated through accessing the environment’s internal state. However, this type of resource is hardly available when applying DRL to real-world problems. Thus, we propose a reward function that uses only the images received as observations. The proposal is evaluated in the Sonic the Hedgehog game. We analyzed the agent’s learning capabilities of the proposed reward function and, in most cases, its performance is similar to that obtained when accessing the environment’s internal state.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this work, we adopted Lowe’s ratio test 0.7.
2.
https://github.com/feliperafael/IBRCM.

References

Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021)
Article MathSciNet MATH Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Inc. (2008)
Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Chiang, I., Huang, C.M., Cheng, N.H., Liu, H.Y., Tsai, S.C., et al.: Efficient exploration in side-scrolling video games with trajectory replay. Comput. Games J. 9(3), 263–280 (2020)
Article Google Scholar
Dewey, D.: Reinforcement learning and the reward engineering principle. In: 2014 AAAI Spring Symposium Series (2014)
Google Scholar
Fayjie, A.R., Hossain, S., Oualid, D., Lee, D.J.: Driverless car: autonomous driving using deep reinforcement learning in urban environment. In: 2018 15th International Conference on Ubiquitous Robots (UR), pp. 896–901. IEEE (2018)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Laud, A.D.: Theory and Application of Reward Shaping in Reinforcement Learning. University of Illinois at Urbana-Champaign (2004)
Google Scholar
Li, Y.: Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274 (2017)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Lowe, G.: Sift-the scale invariant feature transform. Int. J. 2(91–110), 2 (2004)
Google Scholar
McKnight, P.E., Najab, J.: Mann-Whitney U test. Corsini Encycl. Psychol. pp. 1–1 (2010)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331–340), 2 (2009)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Google Scholar
Nichol, A., Pfau, V., Hesse, C., Klimov, O., Schulman, J.: Gotta learn fast: a new benchmark for generalization in RL. arXiv preprint arXiv:1804.03720 (2018)
Paszke, A., et al.: Automatic differentiation in pytorch. In: Proceedings of the Conference on Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Google Scholar
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Chapter Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shao, K., Tang, Z., Zhu, Y., Li, N., Zhao, D.: A survey of deep reinforcement learning in video games. arXiv preprint arXiv:1912.10944 (2019)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
MATH Google Scholar

Download references

Acknowledgements

The authors thank the financial support provided by CNPq (316801/2021-6), Capes, FAPEMIG (APQ-01832-22), and UFJF.

Author information

Authors and Affiliations

Federal University of Juiz de Fora, Minas Gerais, Brazil
Felipe Rafael de Souza, Thiago Silva Miranda & Heder Soares Bernardino

Authors

Felipe Rafael de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Thiago Silva Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Heder Soares Bernardino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe Rafael de Souza .

Editor information

Editors and Affiliations

Federal University of Rio Grande do Norte, Natal, Brazil
João Carlos Xavier-Junior
Federal University of Bahia, Salvador, Brazil
Ricardo Araújo Rios

Appendix

Table 4 presents the number of acts of each level of The Sonic the hedgehog game, as described in Sect. 2.2.

Table 4. Number of acts in each level of The Sonic the Hedgehog game.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Souza, F.R., Miranda, T.S., Bernardino, H.S. (2022). A Reward Function Using Image Processing for a Deep Reinforcement Learning Approach Applied to the Sonic the Hedgehog Game. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-21689-3_14
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Reward Function Using Image Processing for a Deep Reinforcement Learning Approach Applied to the Sonic the Hedgehog Game

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation