Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

Jiang, Zhaohui; Weng, Paul

doi:10.1007/978-3-031-43421-1_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14172))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1467 Accesses
1 Altmetric

Abstract

To improve the sample efficiency of vision-based deep reinforcement learning (RL), we propose a novel method, called SPIRL, to automatically extract important patches from input images. Following Masked Auto-Encoders, SPIRL is based on Vision Transformer models pre-trained in a self-supervised fashion to reconstruct images from randomly-sampled patches. These pre-trained models can then be exploited to detect and select salient patches, defined as hard to reconstruct from neighboring patches. In RL, the SPIRL agent processes selected salient patches via an attention module. We empirically validate SPIRL on Atari games to test its data-efficiency against relevant state-of-the-art methods, including some traditional model-based methods and keypoint-based models. In addition, we analyze our model’s interpretability capabilities.

Partially supported by the program of National Natural Science Foundation of China (No. 62176154).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive patch selection to improve Vision Transformers through Reinforcement Learning

Article Open access 01 April 2025

Exploiting semantic segmentation to boost reinforcement learning in video game environments

Article Open access 15 September 2022

Vision-based attention deep q-network with prior-based knowledge

Article 24 March 2025

Notes

1.
Code and appendix are in https://github.com/AdaptiveAutonomousAgents/SPIRL.

References

Agnew, W., Domingos, P.: Unsupervised object-level deep reinforcement learning. In: NeurIPS Workshop on Deep RL (2018)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: ICML (2017)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. In: ICLR (2018)
Google Scholar
Goel, V., Weng, J., Poupart, P.: Unsupervised video object segmentation for deep reinforcement learning. In: Advances in Neural Information Processing Systems 31 (2018)
Google Scholar
Gopalakrishnan, A., van Steenkiste, S., Schmidhuber, J.: Unsupervised object keypoint learning using local spatial predictability. In: International Conference on Learning Representations (2020)
Google Scholar
Goulão, M., Oliveira, A.L.: Pretraining the vision transformer using self-supervised methods for vision based deep reinforcement learning (2022)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI (2016)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. In: International Conference on Learning Representations (2017)
Google Scholar
Jakab, T., Gupta, A., Bilen, H., Vedaldi, A.: Unsupervised learning of object landmarks through conditional image generation. In: Advances in Neural Information Processing Systems 31 (2018)
Google Scholar
Kaiser, Ł., et al.: Model based reinforcement learning for Atari. In: International Conference on Learning Representations (2019)
Google Scholar
Kalantari, A.A., Amini, M., Chandar, S., Precup, D.: Improving sample efficiency of value based models using attention and vision transformers. arXiv preprint arXiv:2202.00710 (2022)
Kielak, K.P.: Do recent advancements in model-based deep reinforcement learning really improve data efficiency? In: ICLR, p. 6 (2020). https://openreview.net/forum?id=Bke9u1HFwB
Kostreva, M., Ogryczak, W., Wierzbicki, A.: Equitable aggregations and multiple criteria analysis. Eur. J. Operat. Res. 158, 362–367 (2004)
Article MathSciNet MATH Google Scholar
Kulkarni, T.D., et al.: Unsupervised learning of object keypoints for perception and control. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., Srinivas, A.: Reinforcement learning with augmented data. In: NeurIPS (2020)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. In: JMLR (2016)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)
Google Scholar
Lin, Y., Huang, J., Zimmer, M., Guan, Y., Rojas, J., Weng, P.: Invariant transform experience replay: Data augmentation for deep reinforcement learning. IEEE Robot. Autom. Lett. IROS PP, 1 (2020)
Google Scholar
Manuelli, L., Li, Y., Florence, P., Tedrake, R.: Keypoints into the future: self-supervised correspondence in model-based reinforcement learning. In: CoRL (2020)
Google Scholar
Minderer, M., Sun, C., Villegas, R., Cole, F., Murphy, K.P., Lee, H.: Unsupervised learning of object structure and dynamics from videos. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Plappert, M., et al.: Parameter space noise for exploration. In: International Conference on Learning Representations (2018)
Google Scholar
Puterman, M.: Markov decision processes: discrete stochastic dynamic programming. Wiley (1994)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: ICLR (2016)
Google Scholar
Seo, Y., et al.: Masked world models for visual control. In: 6th Annual Conference on Robot Learning (2022)
Google Scholar
Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press (2018)
Google Scholar
Tao, T., Reda, D., van de Panne, M.: Evaluating vision transformer methods for deep reinforcement learning from pixels (2022)
Google Scholar
Van Hasselt, H.P., Hessel, M., Aslanides, J.: When to use parametric models in reinforcement learning? In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Google Scholar
Xiong, R., et al.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning, pp. 10524–10533. PMLR (2020)
Google Scholar
Yarats, D., Kostrikov, I., Fergus, R.: Image augmentation is all you need: regularizing deep reinforcement learning from pixels. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=GY6-6sTvGaf
Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J., Fergus, R.: Improving sample efficiency in model-free reinforcement learning from images. In: AAAI (2020)
Google Scholar
Zadaianchuk, A., Seitzer, M., Martius, G.: Self-supervised visual reinforcement learning with object-centric representations. In: International Conference on Learning Representations (2021)
Google Scholar
Zambaldi, V., et al.: Deep reinforcement learning with relational inductive biases. In: International conference on learning representations (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai, China
Zhaohui Jiang & Paul Weng

Authors

Zhaohui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Paul Weng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Weng .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Z., Weng, P. (2023). Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-43421-1_33
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning