Probabilistic Majorization of Partially Observable Markov Decision Processes

Lefebvre, Tom

doi:10.1007/978-3-031-47958-8_17

Tom Lefebvre ORCID: orcid.org/0000-0003-4548-9623¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1915))

Included in the following conference series:

International Workshop on Active Inference

165 Accesses

Abstract

Markov Decision Processes (MDPs) are wielded by the Reinforcement Learning and control community as a framework to bestow artificial agents with the ability to make autonomous decisions. Control as Inference (CaI) is a tangent research direction that aims to recast optimal decision making as an instance of probabilistic inference, with the dual hope to incite exploration and simplify calculations. Active Inference (AIF) is a sibling theory conforming to similar directives. Notably, AIF also entertains a procedure for per- and proprio-ception, which is currently lacking from the CaI theory. Recent work has established an explicit connection between CaI and Markov Decision Processes (MDPs). In particular, it was shown that the CaI policy can be iterated recursively, ultimately retrieving the associated MDP policy. In this work, such results are generalized to Partially Observable Markov Decision Processes, that – apart from a procedure to make optimal decisions – now also entertains a procedure for model based per- and proprio-ception. By extending the theory of CaI to the context of optimal decision making under partial observability, we mean to further our understanding of and illuminate the relationship between these different frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Decision-making under uncertainty: beyond probabilities

Article Open access 30 May 2023

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Notes

1.
Applying to the whole history of the system.
2.
Ergo by conditioning the present action on future auxiliary observation variables. The exact technical details somewhat deviate from this verbal exposition, however it succeeds elegantly at capturing the gist of the idea.
3.
In fact, the set populated by \(w_t\) is surjective to the set populated with belief functions, \(p(x_t|w_t)\), defined on the state-space.
4.
Note that we could have introduced this formulation at the very beginning and optimized for \(\pi _t\) rather than \(u_t\). Formally this is equivalent since the set of all densities also contains the set of all deterministic functions. Moreover, this would have saved us from the trouble explaining why the decision variables contained in \(w_t\) are treated differently then the decision variable \(u_t\). Now it is clear this is because we do not optimize the decision variable, \(u_t\), itself but rather the policy, \(\pi _t\).
5.
It is rather difficult to give a convincing justification for this model. Rather it should be understood as a technical trick.
6.
Both projection strategies rely on the relative entropy or Kullback-Leibler divergence, \(\mathbb {D}[\pi ||\rho ]\). The relative entropy is a divergence and not a distance and thus asymmetric in its arguments. Therefore the I-projection and the M-projection do not yield the same projection [3, 15]. They are either mode seeking or covering for \(\pi \). As a result the I-projection will underestimate the support of \(\rho \) and vice versa.

References

Abdolmaleki, A., Springenberg, J., Tassa, Y., Munos, R., Heess, N., Riedmiller, M.: Maximum a posteriori policy optimisation. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=S1ANxQW0b
Attias, H.: Planning by probabilistic inference. In: International Workshop on Artificial Intelligence and Statistics, pp. 9–16. PMLR (2003)
Google Scholar
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4, no. 4, p. 738. Springer, New York (2006)
Google Scholar
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
Article MathSciNet MATH Google Scholar
Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: Reward maximization through discrete active inference. Neural Comput. 35(5), 807–852 (2023). https://doi.org/10.1162/neco_a_01574
Hennig, P., Osborne, M., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proc. R. Soc. A Math. Phys. Eng. Sci. 471(2179), 20150142 (2015)
MathSciNet MATH Google Scholar
Hoffmann, C., Rostalski, P.: Linear optimal control on factor graphs-a message passing perspective—. IFAC-PapersOnLine 50(1), 6314–6319 (2017)
Article Google Scholar
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87(2), 159–182 (2012). https://doi.org/10.1007/s10994-012-5278-7
Article MathSciNet MATH Google Scholar
Kárnỳ, M.: Towards fully probabilistic control design. Automatica 32(12), 1719–1722 (1996)
Article MathSciNet MATH Google Scholar
Kárnỳ, M., Guy, T.V.: Fully probabilistic control design. Syst. Control Lett. 55(4), 259–265 (2006)
Article MathSciNet MATH Google Scholar
Lange, K.: MM optimization algorithms. SIAM (2016)
Google Scholar
Lefebvre, T.: A review of probabilistic control and majorization of optimal control (2022). https://doi.org/10.48550/ARXIV.2205.03279
Levine, S.: Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909 (2018)
Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. In: IWAI 2020. CCIS, vol. 1326, pp. 3–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_1
Chapter Google Scholar
Murphy, K.P.: Probabilistic Machine Learning: An Introduction. MIT Press, Cambridge (2022)
Google Scholar
Murphy, K.P.: Probabilistic Machine Learning: Advanced Topics. MIT Press, Cambridge (2023)
Google Scholar
Oates, C.J., Sullivan, T.J.: A modern retrospective on probabilistic numerics. Stat. Comput. 29(6), 1335–1351 (2019). https://doi.org/10.1007/s11222-019-09902-z
Article MathSciNet MATH Google Scholar
Särkkä, S.: Bayesian Filtering and Smoothing, no. 3. Cambridge University Press, Cambridge (2013)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Thrun, S.: Probabilistic robotics. Commun. ACM 45(3), 52–57 (2002)
Article Google Scholar
Toussaint, M.: Robot trajectory optimization using approximate inference. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1049–1056 (2009)
Google Scholar
Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state Markov decision processes. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 945–952 (2006)
Google Scholar
Whittle, P.: Optimal Control: Basics & Beyond. Wiley, Chichester (1996)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Ghent University, Ghent, Belgium
Tom Lefebvre

Authors

Tom Lefebvre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Lefebvre .

Editor information

Editors and Affiliations

University of Sussex, Brighton, UK
Christopher L. Buckley
La Sapienza University of Rome, Rome, Italy
Daniela Cialfi
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
Pablo Lanillos
VERSES Research Lab, Los Angeles, CA, USA
Maxwell Ramstead
University College London, London, UK
Noor Sajid
Kyoto University, Kyoto, Japan
Hideaki Shimazaki
VERSES Research Lab, Los Angeles, CA, USA
Tim Verbelen
Technische Universiteit Delft, Delft, The Netherlands
Martijn Wisse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lefebvre, T. (2024). Probabilistic Majorization of Partially Observable Markov Decision Processes. In: Buckley, C.L., et al. Active Inference. IWAI 2023. Communications in Computer and Information Science, vol 1915. Springer, Cham. https://doi.org/10.1007/978-3-031-47958-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-47958-8_17
Published: 16 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47957-1
Online ISBN: 978-3-031-47958-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Probabilistic Majorization of Partially Observable Markov Decision Processes

Abstract

Access this chapter

Similar content being viewed by others

Decision-making under uncertainty: beyond probabilities

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Probabilistic Majorization of Partially Observable Markov Decision Processes

Abstract

Access this chapter

Similar content being viewed by others

Decision-making under uncertainty: beyond probabilities

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Solving Markov Decision Processes via Simulation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation