Abstract
To improve the sample efficiency of vision-based deep reinforcement learning (RL), we propose a novel method, called SPIRL, to automatically extract important patches from input images. Following Masked Auto-Encoders, SPIRL is based on Vision Transformer models pre-trained in a self-supervised fashion to reconstruct images from randomly-sampled patches. These pre-trained models can then be exploited to detect and select salient patches, defined as hard to reconstruct from neighboring patches. In RL, the SPIRL agent processes selected salient patches via an attention module. We empirically validate SPIRL on Atari games to test its data-efficiency against relevant state-of-the-art methods, including some traditional model-based methods and keypoint-based models. In addition, we analyze our model’s interpretability capabilities.
Partially supported by the program of National Natural Science Foundation of China (No. 62176154).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code and appendix are in https://github.com/AdaptiveAutonomousAgents/SPIRL.
References
Agnew, W., Domingos, P.: Unsupervised object-level deep reinforcement learning. In: NeurIPS Workshop on Deep RL (2018)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: ICML (2017)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Fortunato, M., et al.: Noisy networks for exploration. In: ICLR (2018)
Goel, V., Weng, J., Poupart, P.: Unsupervised video object segmentation for deep reinforcement learning. In: Advances in Neural Information Processing Systems 31 (2018)
Gopalakrishnan, A., van Steenkiste, S., Schmidhuber, J.: Unsupervised object keypoint learning using local spatial predictability. In: International Conference on Learning Representations (2020)
Goulão, M., Oliveira, A.L.: Pretraining the vision transformer using self-supervised methods for vision based deep reinforcement learning (2022)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI (2016)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. In: International Conference on Learning Representations (2017)
Jakab, T., Gupta, A., Bilen, H., Vedaldi, A.: Unsupervised learning of object landmarks through conditional image generation. In: Advances in Neural Information Processing Systems 31 (2018)
Kaiser, Ł., et al.: Model based reinforcement learning for Atari. In: International Conference on Learning Representations (2019)
Kalantari, A.A., Amini, M., Chandar, S., Precup, D.: Improving sample efficiency of value based models using attention and vision transformers. arXiv preprint arXiv:2202.00710 (2022)
Kielak, K.P.: Do recent advancements in model-based deep reinforcement learning really improve data efficiency? In: ICLR, p. 6 (2020). https://openreview.net/forum?id=Bke9u1HFwB
Kostreva, M., Ogryczak, W., Wierzbicki, A.: Equitable aggregations and multiple criteria analysis. Eur. J. Operat. Res. 158, 362–367 (2004)
Kulkarni, T.D., et al.: Unsupervised learning of object keypoints for perception and control. In: Advances in Neural Information Processing Systems 32 (2019)
Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., Srinivas, A.: Reinforcement learning with augmented data. In: NeurIPS (2020)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. In: JMLR (2016)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)
Lin, Y., Huang, J., Zimmer, M., Guan, Y., Rojas, J., Weng, P.: Invariant transform experience replay: Data augmentation for deep reinforcement learning. IEEE Robot. Autom. Lett. IROS PP, 1 (2020)
Manuelli, L., Li, Y., Florence, P., Tedrake, R.: Keypoints into the future: self-supervised correspondence in model-based reinforcement learning. In: CoRL (2020)
Minderer, M., Sun, C., Villegas, R., Cole, F., Murphy, K.P., Lee, H.: Unsupervised learning of object structure and dynamics from videos. In: Advances in Neural Information Processing Systems 32 (2019)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Plappert, M., et al.: Parameter space noise for exploration. In: International Conference on Learning Representations (2018)
Puterman, M.: Markov decision processes: discrete stochastic dynamic programming. Wiley (1994)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: ICLR (2016)
Seo, Y., et al.: Masked world models for visual control. In: 6th Annual Conference on Robot Learning (2022)
Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press (2018)
Tao, T., Reda, D., van de Panne, M.: Evaluating vision transformer methods for deep reinforcement learning from pixels (2022)
Van Hasselt, H.P., Hessel, M., Aslanides, J.: When to use parametric models in reinforcement learning? In: Advances in Neural Information Processing Systems 32 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Xiong, R., et al.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning, pp. 10524–10533. PMLR (2020)
Yarats, D., Kostrikov, I., Fergus, R.: Image augmentation is all you need: regularizing deep reinforcement learning from pixels. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=GY6-6sTvGaf
Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J., Fergus, R.: Improving sample efficiency in model-free reinforcement learning from images. In: AAAI (2020)
Zadaianchuk, A., Seitzer, M., Martius, G.: Self-supervised visual reinforcement learning with object-centric representations. In: International Conference on Learning Representations (2021)
Zambaldi, V., et al.: Deep reinforcement learning with relational inductive biases. In: International conference on learning representations (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, Z., Weng, P. (2023). Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-43421-1_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)