Abstract
Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. The corresponding code is available at: https://github.com/cbellinger27/Learning-when-to-observe-in-RL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13(1–2), 41–77 (2003)
Beeler, C., et al.: Chemgymrl: An interactive framework for reinforcement learning for digital chemistry. arXiv preprint arXiv:2305.14177 (2023)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bellinger, C., Coles, R., Crowley, M., Tamblyn, I.: Active measure reinforcement learning for observation cost minimization. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (Jun 8 2021). https://caiac.pubpub.org/pub/3hn8s5v9
Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Balancing information with observation costs in deep reinforcement learning. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (may 27 2022). https://caiac.pubpub.org/pub/0jmy7gpd
Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Scientific discovery and the cost of measurement – balancing information and cost in reinforcement learning. In: ICML 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) (Feb 13 2023)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Daniel, K.: Thinking fast and slow. United States of America (2011)
Fu, Y.: The Cost of OPS in Reinforcement Learning. Master’s thesis, University of California, Berkeley (2021)
Gal, Y., McAllister, R., Rasmussen, C.E.: Improving pilco with bayesian neural network dynamics models. In: Data-Efficient Machine Learning workshop, ICML. vol. 4, p. 34 (2016)
Koseoglu, M., Özcelikkale, A.: How to miss data?: Reinforcement learning for environments with high observation cost. In: 2020 International Conference on Machine Learning (ICML) Workshop, Wien, Österrike, 12-18 juli (2020)
Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Mills, K., Ronagh, P., Tamblyn, I.: Finding the ground state of spin Hamiltonians with reinforcement learning. Nature Mach. Intell. 2(9), 509–517 (2020). https://doi.org/10.1038/s42256-020-0226-x
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nam, H.A., Fleming, S., Brunskill, E.: Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes. Adv. Neural. Inf. Process. Syst. 34, 15650–15666 (2021)
Ong, S.C., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Shann, T.Y.A.: Reinforcement learning in the presence of sensing costs. Master’s thesis, University of British Columbia (2022). https://doi.org/10.14288/1.0413129, https://open.library.ubc.ca/collections/ubctheses/24/items/1.0413129
Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv preprint arXiv:1702.06054 (2017)
Simon, H.A.: Bounded rationality. Utility and probability, pp. 15–18 (1990)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Acknowledgments
This work was supported with funding from the National Research Council of Canada’s AI for Design Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bellinger, C., Tamblyn, I., Crowley, M. (2025). Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-74640-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74639-0
Online ISBN: 978-3-031-74640-6
eBook Packages: Artificial Intelligence (R0)