Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms

Yu, Yide; Ma, Yan; Liu, Yue; Wong, Dennis; Lei, Kin; Egas-López, José Vicente

doi:10.1007/978-3-031-34111-3_13

Yide Yu¹⁹,
Yan Ma¹⁹,
Yue Liu¹⁹,
Dennis Wong¹⁹,
Kin Lei¹⁹ &
…
José Vicente Egas-López²⁰

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 675))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

814 Accesses

Abstract

The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol \(\varDelta \), where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and \(\varDelta \). To calculate the cumulative entropy \(\lambda \) of rewards in \(\langle o, a, \cdot \rangle \), we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining \(\varDelta \) and OC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pajarinen, J., Lundell, J., Kyrki, V.: POMDP planning under object composition uncertainty: Application to robotic manipulation. IEEE Trans. Robotics (2022)
Google Scholar
Zhang, C., et al.: Traffic Mirror-Aware POMDP behavior planning for autonomous urban driving. In: 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE (2022)
Google Scholar
Singh, G., Roy, R.N., Chanel, C.P.C.: Pomdp-based adaptive interaction through physiological computing (2022)
Google Scholar
Chadès, I., Pascal, L.V., Nicol, S., Fletcher, C.S., Ferrer-Mestres, J.: A primer on partially observable Markov decision processes (POMDPs). Methods Ecol. Evol. 12(11), 2058–2072 (2021)
Article Google Scholar
Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)
Article MathSciNet MATH Google Scholar
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Article MATH Google Scholar
Cassandra, A.R.: The POMDP Page (2003). https://www.pomdp.org/. (Accessed 10 Dec 2022)
Chadès, I., McDonald-Madden, E., McCarthy, M.A., Wintle, B., Linkie, M., Possingham, H.P.: When to stop managing or surveying cryptic threatened species. Proc. Natl. Acad. Sci. 105(37), 13936–13940 (2008)
Article Google Scholar
Chadès, I., Martin, T.G., Nicol, S., Burgman, M.A., Possingham, H.P., Buckley, Y.M.: General rules for managing and surveying networks of pests, diseases, and endangered species. Proc. Natl. Acad. Sci. 108(20), 8323–8328 (2011)
Article Google Scholar
Chen, X., et al.: Flow-based recurrent belief state learning for pomdps. In: International Conference on Machine Learning. PMLR (2022)
Google Scholar
Biehl, M., Virgo, N.: Interpreting systems as solving POMDPs: a step towards a formal understanding of agency. arXiv preprint arXiv:2209.01619 (2022)
Kavaklioglu, C., Helmeczi, R., Cevik, M.: Linear programming-based solution methods for constrained POMDPs. arXiv preprint arXiv:2206.14081 (2022)
Yang, Y., Chen, J., Li, S.: Learning POMDP models with similarity space regularization: a linear gaussian case study. In: Learning for Dynamics and Control Conference. PMLR (2022)
Google Scholar
Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Point-based planning for multi-objective POMDPs. In: Proceedings of the Twenty-fourth International Joint Conference On Artificial Intelligence (IJCAI) (2015)
Google Scholar
Demin, V.: Cliff walking problem (2009)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Meuth, R.J.: A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds
Google Scholar
Sutton, R.S., Barto, A.G., et al.: Introduction to reinforcement learning (1998)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Article MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica 17(2), 229–235 (1999)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, China
Yide Yu, Yan Ma, Yue Liu, Dennis Wong & Kin Lei
RGAI, University of Szeged, Szeged, Hungary
José Vicente Egas-López

Authors

Yide Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Wong
View author publications
You can also search for this author in PubMed Google Scholar
Kin Lei
View author publications
You can also search for this author in PubMed Google Scholar
José Vicente Egas-López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yide Yu .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Sunderland, Sunderland, UK
John MacIntyre
University of Leon, León, Spain
Manuel Dominguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, Y., Ma, Y., Liu, Y., Wong, D., Lei, K., Egas-López, J.V. (2023). Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 675. Springer, Cham. https://doi.org/10.1007/978-3-031-34111-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-34111-3_13
Published: 01 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34110-6
Online ISBN: 978-3-031-34111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms