Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World

Bellinger, Colin; Tamblyn, Isaac; Crowley, Mark

doi:10.1007/978-3-031-74640-6_18

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2136))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

7 Accesses

Abstract

Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. The corresponding code is available at: https://github.com/cbellinger27/Learning-when-to-observe-in-RL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transferring Domain Knowledge with an Adviser in Continuous Tasks

A survey on model-based reinforcement learning

Article 23 January 2024

Model-Based Reinforcement Learning with Multi-task Offline Pretraining

Notes

1.
https://www.youtube.com/playlist?list=PLr6sWY5moZhFtTuCBbIjb4cZQOZkbjkOV.

References

Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13(1–2), 41–77 (2003)
Article MathSciNet MATH Google Scholar
Beeler, C., et al.: Chemgymrl: An interactive framework for reinforcement learning for digital chemistry. arXiv preprint arXiv:2305.14177 (2023)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article MATH Google Scholar
Bellinger, C., Coles, R., Crowley, M., Tamblyn, I.: Active measure reinforcement learning for observation cost minimization. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (Jun 8 2021). https://caiac.pubpub.org/pub/3hn8s5v9
Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Balancing information with observation costs in deep reinforcement learning. In: Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC) (may 27 2022). https://caiac.pubpub.org/pub/0jmy7gpd
Bellinger, C., Drozdyuk, A., Crowley, M., Tamblyn, I.: Scientific discovery and the cost of measurement – balancing information and cost in reinforcement learning. In: ICML 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) (Feb 13 2023)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Daniel, K.: Thinking fast and slow. United States of America (2011)
Google Scholar
Fu, Y.: The Cost of OPS in Reinforcement Learning. Master’s thesis, University of California, Berkeley (2021)
Google Scholar
Gal, Y., McAllister, R., Rasmussen, C.E.: Improving pilco with bayesian neural network dynamics models. In: Data-Efficient Machine Learning workshop, ICML. vol. 4, p. 34 (2016)
Google Scholar
Koseoglu, M., Özcelikkale, A.: How to miss data?: Reinforcement learning for environments with high observation cost. In: 2020 International Conference on Machine Learning (ICML) Workshop, Wien, Österrike, 12-18 juli (2020)
Google Scholar
Lakshminarayanan, A., Sharma, S., Ravindran, B.: Dynamic action repetition for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Mills, K., Ronagh, P., Tamblyn, I.: Finding the ground state of spin Hamiltonians with reinforcement learning. Nature Mach. Intell. 2(9), 509–517 (2020). https://doi.org/10.1038/s42256-020-0226-x
Article MATH Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Nam, H.A., Fleming, S., Brunskill, E.: Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes. Adv. Neural. Inf. Process. Syst. 34, 15650–15666 (2021)
MATH Google Scholar
Ong, S.C., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)
Article MATH Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Article MathSciNet MATH Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Shann, T.Y.A.: Reinforcement learning in the presence of sensing costs. Master’s thesis, University of British Columbia (2022). https://doi.org/10.14288/1.0413129, https://open.library.ubc.ca/collections/ubctheses/24/items/1.0413129
Sharma, S., Srinivas, A., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv preprint arXiv:1702.06054 (2017)
Simon, H.A.: Bounded rationality. Utility and probability, pp. 15–18 (1990)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported with funding from the National Research Council of Canada’s AI for Design Program.

Author information

Authors and Affiliations

National Research Council of Canada, Ottawa, Canada
Colin Bellinger
Department of Physics, University of Ottawa, Ottawa, Canada
Isaac Tamblyn
Vector Institute for Artificial Intelligence, Toronto, ON, Canada
Isaac Tamblyn
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
Mark Crowley

Authors

Colin Bellinger
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Tamblyn
View author publications
You can also search for this author in PubMed Google Scholar
Mark Crowley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Colin Bellinger .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Rosa Meo
Sapienza University of Rome, Rome, Italy
Fabrizio Silvestri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bellinger, C., Tamblyn, I., Crowley, M. (2025). Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2136. Springer, Cham. https://doi.org/10.1007/978-3-031-74640-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-74640-6_18
Published: 01 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74639-0
Online ISBN: 978-3-031-74640-6
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transferring Domain Knowledge with an Adviser in Continuous Tasks

A survey on model-based reinforcement learning

Model-Based Reinforcement Learning with Multi-task Offline Pretraining

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning When to Observe: A Frugal Reinforcement Learning Framework for a High-Cost World

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transferring Domain Knowledge with an Adviser in Continuous Tasks

A survey on model-based reinforcement learning

Model-Based Reinforcement Learning with Multi-task Offline Pretraining

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation