Skip to main content

Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms

  • Conference paper
  • First Online:
Artificial Intelligence Applications and Innovations (AIAI 2023)

Abstract

The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol \(\varDelta \), where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and \(\varDelta \). To calculate the cumulative entropy \(\lambda \) of rewards in \(\langle o, a, \cdot \rangle \), we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining \(\varDelta \) and OC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pajarinen, J., Lundell, J., Kyrki, V.: POMDP planning under object composition uncertainty: Application to robotic manipulation. IEEE Trans. Robotics (2022)

    Google Scholar 

  2. Zhang, C., et al.: Traffic Mirror-Aware POMDP behavior planning for autonomous urban driving. In: 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE (2022)

    Google Scholar 

  3. Singh, G., Roy, R.N., Chanel, C.P.C.: Pomdp-based adaptive interaction through physiological computing (2022)

    Google Scholar 

  4. Chadès, I., Pascal, L.V., Nicol, S., Fletcher, C.S., Ferrer-Mestres, J.: A primer on partially observable Markov decision processes (POMDPs). Methods Ecol. Evol. 12(11), 2058–2072 (2021)

    Article  Google Scholar 

  5. Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  6. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)

    Article  MATH  Google Scholar 

  7. Cassandra, A.R.: The POMDP Page (2003). https://www.pomdp.org/. (Accessed 10 Dec 2022)

  8. Chadès, I., McDonald-Madden, E., McCarthy, M.A., Wintle, B., Linkie, M., Possingham, H.P.: When to stop managing or surveying cryptic threatened species. Proc. Natl. Acad. Sci. 105(37), 13936–13940 (2008)

    Article  Google Scholar 

  9. Chadès, I., Martin, T.G., Nicol, S., Burgman, M.A., Possingham, H.P., Buckley, Y.M.: General rules for managing and surveying networks of pests, diseases, and endangered species. Proc. Natl. Acad. Sci. 108(20), 8323–8328 (2011)

    Article  Google Scholar 

  10. Chen, X., et al.: Flow-based recurrent belief state learning for pomdps. In: International Conference on Machine Learning. PMLR (2022)

    Google Scholar 

  11. Biehl, M., Virgo, N.: Interpreting systems as solving POMDPs: a step towards a formal understanding of agency. arXiv preprint arXiv:2209.01619 (2022)

  12. Kavaklioglu, C., Helmeczi, R., Cevik, M.: Linear programming-based solution methods for constrained POMDPs. arXiv preprint arXiv:2206.14081 (2022)

  13. Yang, Y., Chen, J., Li, S.: Learning POMDP models with similarity space regularization: a linear gaussian case study. In: Learning for Dynamics and Control Conference. PMLR (2022)

    Google Scholar 

  14. Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Point-based planning for multi-objective POMDPs. In: Proceedings of the Twenty-fourth International Joint Conference On Artificial Intelligence (IJCAI) (2015)

    Google Scholar 

  15. Demin, V.: Cliff walking problem (2009)

    Google Scholar 

  16. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  17. Meuth, R.J.: A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds

    Google Scholar 

  18. Sutton, R.S., Barto, A.G., et al.: Introduction to reinforcement learning (1998)

    Google Scholar 

  19. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)

    Article  MATH  Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica 17(2), 229–235 (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yide Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, Y., Ma, Y., Liu, Y., Wong, D., Lei, K., Egas-López, J.V. (2023). Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 675. Springer, Cham. https://doi.org/10.1007/978-3-031-34111-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34111-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34110-6

  • Online ISBN: 978-3-031-34111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics