Skip to main content

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Abstract

To improve the sample efficiency of vision-based deep reinforcement learning (RL), we propose a novel method, called SPIRL, to automatically extract important patches from input images. Following Masked Auto-Encoders, SPIRL is based on Vision Transformer models pre-trained in a self-supervised fashion to reconstruct images from randomly-sampled patches. These pre-trained models can then be exploited to detect and select salient patches, defined as hard to reconstruct from neighboring patches. In RL, the SPIRL agent processes selected salient patches via an attention module. We empirically validate SPIRL on Atari games to test its data-efficiency against relevant state-of-the-art methods, including some traditional model-based methods and keypoint-based models. In addition, we analyze our model’s interpretability capabilities.

Partially supported by the program of National Natural Science Foundation of China (No. 62176154).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code and appendix are in https://github.com/AdaptiveAutonomousAgents/SPIRL.

References

  1. Agnew, W., Domingos, P.: Unsupervised object-level deep reinforcement learning. In: NeurIPS Workshop on Deep RL (2018)

    Google Scholar 

  2. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  3. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: ICML (2017)

    Google Scholar 

  4. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  6. Fortunato, M., et al.: Noisy networks for exploration. In: ICLR (2018)

    Google Scholar 

  7. Goel, V., Weng, J., Poupart, P.: Unsupervised video object segmentation for deep reinforcement learning. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  8. Gopalakrishnan, A., van Steenkiste, S., Schmidhuber, J.: Unsupervised object keypoint learning using local spatial predictability. In: International Conference on Learning Representations (2020)

    Google Scholar 

  9. Goulão, M., Oliveira, A.L.: Pretraining the vision transformer using self-supervised methods for vision based deep reinforcement learning (2022)

    Google Scholar 

  10. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)

    Google Scholar 

  11. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI (2016)

    Google Scholar 

  12. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  15. Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. In: International Conference on Learning Representations (2017)

    Google Scholar 

  16. Jakab, T., Gupta, A., Bilen, H., Vedaldi, A.: Unsupervised learning of object landmarks through conditional image generation. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  17. Kaiser, Ł., et al.: Model based reinforcement learning for Atari. In: International Conference on Learning Representations (2019)

    Google Scholar 

  18. Kalantari, A.A., Amini, M., Chandar, S., Precup, D.: Improving sample efficiency of value based models using attention and vision transformers. arXiv preprint arXiv:2202.00710 (2022)

  19. Kielak, K.P.: Do recent advancements in model-based deep reinforcement learning really improve data efficiency? In: ICLR, p. 6 (2020). https://openreview.net/forum?id=Bke9u1HFwB

  20. Kostreva, M., Ogryczak, W., Wierzbicki, A.: Equitable aggregations and multiple criteria analysis. Eur. J. Operat. Res. 158, 362–367 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  21. Kulkarni, T.D., et al.: Unsupervised learning of object keypoints for perception and control. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  22. Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., Srinivas, A.: Reinforcement learning with augmented data. In: NeurIPS (2020)

    Google Scholar 

  23. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. In: JMLR (2016)

    Google Scholar 

  24. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)

    Google Scholar 

  25. Lin, Y., Huang, J., Zimmer, M., Guan, Y., Rojas, J., Weng, P.: Invariant transform experience replay: Data augmentation for deep reinforcement learning. IEEE Robot. Autom. Lett. IROS PP, 1 (2020)

    Google Scholar 

  26. Manuelli, L., Li, Y., Florence, P., Tedrake, R.: Keypoints into the future: self-supervised correspondence in model-based reinforcement learning. In: CoRL (2020)

    Google Scholar 

  27. Minderer, M., Sun, C., Villegas, R., Cole, F., Murphy, K.P., Lee, H.: Unsupervised learning of object structure and dynamics from videos. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  28. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  29. Plappert, M., et al.: Parameter space noise for exploration. In: International Conference on Learning Representations (2018)

    Google Scholar 

  30. Puterman, M.: Markov decision processes: discrete stochastic dynamic programming. Wiley (1994)

    Google Scholar 

  31. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: ICLR (2016)

    Google Scholar 

  32. Seo, Y., et al.: Masked world models for visual control. In: 6th Annual Conference on Robot Learning (2022)

    Google Scholar 

  33. Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press (2018)

    Google Scholar 

  34. Tao, T., Reda, D., van de Panne, M.: Evaluating vision transformer methods for deep reinforcement learning from pixels (2022)

    Google Scholar 

  35. Van Hasselt, H.P., Hessel, M., Aslanides, J.: When to use parametric models in reinforcement learning? In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  36. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)

    Google Scholar 

  37. Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)

    Google Scholar 

  38. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)

    Google Scholar 

  39. Xiong, R., et al.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning, pp. 10524–10533. PMLR (2020)

    Google Scholar 

  40. Yarats, D., Kostrikov, I., Fergus, R.: Image augmentation is all you need: regularizing deep reinforcement learning from pixels. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=GY6-6sTvGaf

  41. Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J., Fergus, R.: Improving sample efficiency in model-free reinforcement learning from images. In: AAAI (2020)

    Google Scholar 

  42. Zadaianchuk, A., Seitzer, M., Martius, G.: Self-supervised visual reinforcement learning with object-centric representations. In: International Conference on Learning Representations (2021)

    Google Scholar 

  43. Zambaldi, V., et al.: Deep reinforcement learning with relational inductive biases. In: International conference on learning representations (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Weng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, Z., Weng, P. (2023). Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43421-1_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43420-4

  • Online ISBN: 978-3-031-43421-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics