Skip to main content

A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment

  • Conference paper
  • First Online:
Artificial Intelligence XL (SGAI 2023)

Abstract

Sparse-reward reinforcement learning environments pose a particular challenge because the agent receives infrequent rewards, making it difficult to learn an optimal policy. In this paper, we propose NSSE, a novel approach that combines that stratified state space exploration with prioritised sweeping to enhance the informativeness of learning, thus enabling fast learning convergence. We evaluate NSSE on three typical Atari sparse reward environments. The results demonstrate that our state space exploration method exhibits strong performance compared to two baseline algorithms: Deep Q-Network (DQN) and noisy Deep Q-Network (Noisy DQN).

Dr Yang is supported by Royal Academy Engineering SHE project RAEng (IF2223-172) and Royal Society of Edinburgh (961_Yang).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aubret, A., Matignon, L., Hassas, S.: An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey. Entropy 25(2), 327 (2023)

    Article  Google Scholar 

  2. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  3. Bellman, R.: A Markovian decision process. J. Math. Mech. 679–684 (1957)

    Google Scholar 

  4. Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)

  5. Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: First return, then explore. Nature 590(7847), 580–586 (2021)

    Article  Google Scholar 

  6. Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)

  7. Hester, T., et al.: Deep Q-learning from demonstrations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  8. Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)

  9. Jo, D., et al.: LECO: learnable episodic count for task-specific intrinsic reward. Adv. Neural. Inf. Process. Syst. 35, 30432–30445 (2022)

    Google Scholar 

  10. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)

    Google Scholar 

  11. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. In: Wiley Series in Probability and Statistics (1994)

    Google Scholar 

  12. Saglam, B., Mutlu, F.B., Cicek, D.C., Kozat, S.S.: Actor prioritized experience replay. arXiv preprint arXiv:2209.00532 (2022)

  13. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  14. Seo, Y., Chen, L., Shin, J., Lee, H., Abbeel, P., Lee, K.: State entropy maximization with random encoders for efficient exploration. In: International Conference on Machine Learning, pp. 9443–9454. PMLR (2021)

    Google Scholar 

  15. Yu, X., Lyu, Y., Tsang, I.: Intrinsic reward driven imitation learning via generative model. In: International Conference on Machine Learning, pp. 10925–10935. PMLR (2020)

    Google Scholar 

  16. Yuan, M., Pun, M.O., Wang, D.: Rényi state entropy maximization for exploration acceleration in reinforcement learning. IEEE Trans. Artif. Intell. (2022)

    Google Scholar 

  17. Zheng, C., Yang, S., Parra-Ullauri, J.M., Garcia-Dominguez, A., Bencomo, N.: Reward-reinforced generative adversarial networks for multi-agent systems. IEEE Trans. Emerg. Top. Comput. Intell. 6, 479–488 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shufan Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X. et al. (2023). A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47994-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47993-9

  • Online ISBN: 978-3-031-47994-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics