Skip to main content

Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023)

Abstract

Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. The corresponding code is available at: https://github.com/cbellinger27/Learning-when-to-observe-in-RL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.youtube.com/playlist?list=PLr6sWY5moZhFtTuCBbIjb4cZQOZkbjkOV.

References

  1. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13(1–2), 41–77 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Beeler, C., et al.: Chemgymrl: An interactive framework for reinforcement learning for digital chemistry. arXiv preprint arXiv:2305.14177 (2023)

  3. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  MATH  Google Scholar 

  4. Bellinger, C., Coles, R., Crowley, M., Tamblyn, I.: Active measure reinforcement learning for observation cost minimization. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (Jun 8 2021). https://caiac.pubpub.org/pub/3hn8s5v9

  5. Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Balancing information with observation costs in deep reinforcement learning. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (may 27 2022). https://caiac.pubpub.org/pub/0jmy7gpd

  6. Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Scientific discovery and the cost of measurement – balancing information and cost in reinforcement learning. In: ICML 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) (Feb 13 2023)

    Google Scholar 

  7. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  8. Daniel, K.: Thinking fast and slow. United States of America (2011)

    Google Scholar 

  9. Fu, Y.: The Cost of OPS in Reinforcement Learning. Master’s thesis, University of California, Berkeley (2021)

    Google Scholar 

  10. Gal, Y., McAllister, R., Rasmussen, C.E.: Improving pilco with bayesian neural network dynamics models. In: Data-Efficient Machine Learning workshop, ICML. vol. 4, p. 34 (2016)

    Google Scholar 

  11. Koseoglu, M., Özcelikkale, A.: How to miss data?: Reinforcement learning for environments with high observation cost. In: 2020 International Conference on Machine Learning (ICML) Workshop, Wien, Österrike, 12-18 juli (2020)

    Google Scholar 

  12. Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  13. Mills, K., Ronagh, P., Tamblyn, I.: Finding the ground state of spin Hamiltonians with reinforcement learning. Nature Mach. Intell. 2(9), 509–517 (2020). https://doi.org/10.1038/s42256-020-0226-x

    Article  MATH  Google Scholar 

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  15. Nam, H.A., Fleming, S., Brunskill, E.: Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes. Adv. Neural. Inf. Process. Syst. 34, 15650–15666 (2021)

    MATH  Google Scholar 

  16. Ong, S.C., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)

    Article  MATH  Google Scholar 

  17. Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  18. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)

    Google Scholar 

  19. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  20. Shann, T.Y.A.: Reinforcement learning in the presence of sensing costs. Master’s thesis, University of British Columbia (2022). https://doi.org/10.14288/1.0413129, https://open.library.ubc.ca/collections/ubctheses/24/items/1.0413129

  21. Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv preprint arXiv:1702.06054 (2017)

  22. Simon, H.A.: Bounded rationality. Utility and probability, pp. 15–18 (1990)

    Google Scholar 

  23. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)

    Google Scholar 

  24. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  25. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

Download references

Acknowledgments

This work was supported with funding from the National Research Council of Canada’s AI for Design Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Colin Bellinger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellinger, C., Tamblyn, I., Crowley, M. (2025). Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-74640-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-74639-0

  • Online ISBN: 978-3-031-74640-6

  • eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics