Skip to main content

Probabilistic Majorization of Partially Observable Markov Decision Processes

  • Conference paper
  • First Online:
Active Inference (IWAI 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1915))

Included in the following conference series:

  • 165 Accesses

Abstract

Markov Decision Processes (MDPs) are wielded by the Reinforcement Learning and control community as a framework to bestow artificial agents with the ability to make autonomous decisions. Control as Inference (CaI) is a tangent research direction that aims to recast optimal decision making as an instance of probabilistic inference, with the dual hope to incite exploration and simplify calculations. Active Inference (AIF) is a sibling theory conforming to similar directives. Notably, AIF also entertains a procedure for per- and proprio-ception, which is currently lacking from the CaI theory. Recent work has established an explicit connection between CaI and Markov Decision Processes (MDPs). In particular, it was shown that the CaI policy can be iterated recursively, ultimately retrieving the associated MDP policy. In this work, such results are generalized to Partially Observable Markov Decision Processes, that – apart from a procedure to make optimal decisions – now also entertains a procedure for model based per- and proprio-ception. By extending the theory of CaI to the context of optimal decision making under partial observability, we mean to further our understanding of and illuminate the relationship between these different frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Applying to the whole history of the system.

  2. 2.

    Ergo by conditioning the present action on future auxiliary observation variables. The exact technical details somewhat deviate from this verbal exposition, however it succeeds elegantly at capturing the gist of the idea.

  3. 3.

    In fact, the set populated by \(w_t\) is surjective to the set populated with belief functions, \(p(x_t|w_t)\), defined on the state-space.

  4. 4.

    Note that we could have introduced this formulation at the very beginning and optimized for \(\pi _t\) rather than \(u_t\). Formally this is equivalent since the set of all densities also contains the set of all deterministic functions. Moreover, this would have saved us from the trouble explaining why the decision variables contained in \(w_t\) are treated differently then the decision variable \(u_t\). Now it is clear this is because we do not optimize the decision variable, \(u_t\), itself but rather the policy, \(\pi _t\).

  5. 5.

    It is rather difficult to give a convincing justification for this model. Rather it should be understood as a technical trick.

  6. 6.

    Both projection strategies rely on the relative entropy or Kullback-Leibler divergence, \(\mathbb {D}[\pi ||\rho ]\). The relative entropy is a divergence and not a distance and thus asymmetric in its arguments. Therefore the I-projection and the M-projection do not yield the same projection [3, 15]. They are either mode seeking or covering for \(\pi \). As a result the I-projection will underestimate the support of \(\rho \) and vice versa.

References

  1. Abdolmaleki, A., Springenberg, J., Tassa, Y., Munos, R., Heess, N., Riedmiller, M.: Maximum a posteriori policy optimisation. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=S1ANxQW0b

  2. Attias, H.: Planning by probabilistic inference. In: International Workshop on Artificial Intelligence and Statistics, pp. 9–16. PMLR (2003)

    Google Scholar 

  3. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4, no. 4, p. 738. Springer, New York (2006)

    Google Scholar 

  4. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  5. Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: Reward maximization through discrete active inference. Neural Comput. 35(5), 807–852 (2023). https://doi.org/10.1162/neco_a_01574

  6. Hennig, P., Osborne, M., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proc. R. Soc. A Math. Phys. Eng. Sci. 471(2179), 20150142 (2015)

    MathSciNet  MATH  Google Scholar 

  7. Hoffmann, C., Rostalski, P.: Linear optimal control on factor graphs-a message passing perspective—. IFAC-PapersOnLine 50(1), 6314–6319 (2017)

    Article  Google Scholar 

  8. Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Mach. Learn. 87(2), 159–182 (2012). https://doi.org/10.1007/s10994-012-5278-7

    Article  MathSciNet  MATH  Google Scholar 

  9. Kárnỳ, M.: Towards fully probabilistic control design. Automatica 32(12), 1719–1722 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kárnỳ, M., Guy, T.V.: Fully probabilistic control design. Syst. Control Lett. 55(4), 259–265 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lange, K.: MM optimization algorithms. SIAM (2016)

    Google Scholar 

  12. Lefebvre, T.: A review of probabilistic control and majorization of optimal control (2022). https://doi.org/10.48550/ARXIV.2205.03279

  13. Levine, S.: Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909 (2018)

  14. Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. In: IWAI 2020. CCIS, vol. 1326, pp. 3–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_1

    Chapter  Google Scholar 

  15. Murphy, K.P.: Probabilistic Machine Learning: An Introduction. MIT Press, Cambridge (2022)

    Google Scholar 

  16. Murphy, K.P.: Probabilistic Machine Learning: Advanced Topics. MIT Press, Cambridge (2023)

    Google Scholar 

  17. Oates, C.J., Sullivan, T.J.: A modern retrospective on probabilistic numerics. Stat. Comput. 29(6), 1335–1351 (2019). https://doi.org/10.1007/s11222-019-09902-z

    Article  MathSciNet  MATH  Google Scholar 

  18. Särkkä, S.: Bayesian Filtering and Smoothing, no. 3. Cambridge University Press, Cambridge (2013)

    Google Scholar 

  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  20. Thrun, S.: Probabilistic robotics. Commun. ACM 45(3), 52–57 (2002)

    Article  Google Scholar 

  21. Toussaint, M.: Robot trajectory optimization using approximate inference. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1049–1056 (2009)

    Google Scholar 

  22. Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state Markov decision processes. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 945–952 (2006)

    Google Scholar 

  23. Whittle, P.: Optimal Control: Basics & Beyond. Wiley, Chichester (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Lefebvre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lefebvre, T. (2024). Probabilistic Majorization of Partially Observable Markov Decision Processes. In: Buckley, C.L., et al. Active Inference. IWAI 2023. Communications in Computer and Information Science, vol 1915. Springer, Cham. https://doi.org/10.1007/978-3-031-47958-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47958-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47957-1

  • Online ISBN: 978-3-031-47958-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics