Abstract
Markov Decision Processes (MDPs) are wielded by the Reinforcement Learning and control community as a framework to bestow artificial agents with the ability to make autonomous decisions. Control as Inference (CaI) is a tangent research direction that aims to recast optimal decision making as an instance of probabilistic inference, with the dual hope to incite exploration and simplify calculations. Active Inference (AIF) is a sibling theory conforming to similar directives. Notably, AIF also entertains a procedure for per- and proprio-ception, which is currently lacking from the CaI theory. Recent work has established an explicit connection between CaI and Markov Decision Processes (MDPs). In particular, it was shown that the CaI policy can be iterated recursively, ultimately retrieving the associated MDP policy. In this work, such results are generalized to Partially Observable Markov Decision Processes, that – apart from a procedure to make optimal decisions – now also entertains a procedure for model based per- and proprio-ception. By extending the theory of CaI to the context of optimal decision making under partial observability, we mean to further our understanding of and illuminate the relationship between these different frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Applying to the whole history of the system.
- 2.
Ergo by conditioning the present action on future auxiliary observation variables. The exact technical details somewhat deviate from this verbal exposition, however it succeeds elegantly at capturing the gist of the idea.
- 3.
In fact, the set populated by \(w_t\) is surjective to the set populated with belief functions, \(p(x_t|w_t)\), defined on the state-space.
- 4.
Note that we could have introduced this formulation at the very beginning and optimized for \(\pi _t\) rather than \(u_t\). Formally this is equivalent since the set of all densities also contains the set of all deterministic functions. Moreover, this would have saved us from the trouble explaining why the decision variables contained in \(w_t\) are treated differently then the decision variable \(u_t\). Now it is clear this is because we do not optimize the decision variable, \(u_t\), itself but rather the policy, \(\pi _t\).
- 5.
It is rather difficult to give a convincing justification for this model. Rather it should be understood as a technical trick.
- 6.
Both projection strategies rely on the relative entropy or Kullback-Leibler divergence, \(\mathbb {D}[\pi ||\rho ]\). The relative entropy is a divergence and not a distance and thus asymmetric in its arguments. Therefore the I-projection and the M-projection do not yield the same projection [3, 15]. They are either mode seeking or covering for \(\pi \). As a result the I-projection will underestimate the support of \(\rho \) and vice versa.
References
Abdolmaleki, A., Springenberg, J., Tassa, Y., Munos, R., Heess, N., Riedmiller, M.: Maximum a posteriori policy optimisation. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=S1ANxQW0b
Attias, H.: Planning by probabilistic inference. In: International Workshop on Artificial Intelligence and Statistics, pp. 9–16. PMLR (2003)
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4, no. 4, p. 738. Springer, New York (2006)
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: Reward maximization through discrete active inference. Neural Comput. 35(5), 807–852 (2023). https://doi.org/10.1162/neco_a_01574
Hennig, P., Osborne, M., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proc. R. Soc. A Math. Phys. Eng. Sci. 471(2179), 20150142 (2015)
Hoffmann, C., Rostalski, P.: Linear optimal control on factor graphs-a message passing perspective—. IFAC-PapersOnLine 50(1), 6314–6319 (2017)
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87(2), 159–182 (2012). https://doi.org/10.1007/s10994-012-5278-7
Kárnỳ, M.: Towards fully probabilistic control design. Automatica 32(12), 1719–1722 (1996)
Kárnỳ, M., Guy, T.V.: Fully probabilistic control design. Syst. Control Lett. 55(4), 259–265 (2006)
Lange, K.: MM optimization algorithms. SIAM (2016)
Lefebvre, T.: A review of probabilistic control and majorization of optimal control (2022). https://doi.org/10.48550/ARXIV.2205.03279
Levine, S.: Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909 (2018)
Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. In: IWAI 2020. CCIS, vol. 1326, pp. 3–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_1
Murphy, K.P.: Probabilistic Machine Learning: An Introduction. MIT Press, Cambridge (2022)
Murphy, K.P.: Probabilistic Machine Learning: Advanced Topics. MIT Press, Cambridge (2023)
Oates, C.J., Sullivan, T.J.: A modern retrospective on probabilistic numerics. Stat. Comput. 29(6), 1335–1351 (2019). https://doi.org/10.1007/s11222-019-09902-z
Särkkä, S.: Bayesian Filtering and Smoothing, no. 3. Cambridge University Press, Cambridge (2013)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Thrun, S.: Probabilistic robotics. Commun. ACM 45(3), 52–57 (2002)
Toussaint, M.: Robot trajectory optimization using approximate inference. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1049–1056 (2009)
Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state Markov decision processes. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 945–952 (2006)
Whittle, P.: Optimal Control: Basics & Beyond. Wiley, Chichester (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lefebvre, T. (2024). Probabilistic Majorization of Partially Observable Markov Decision Processes. In: Buckley, C.L., et al. Active Inference. IWAI 2023. Communications in Computer and Information Science, vol 1915. Springer, Cham. https://doi.org/10.1007/978-3-031-47958-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-47958-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47957-1
Online ISBN: 978-3-031-47958-8
eBook Packages: Computer ScienceComputer Science (R0)