Abstract
The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol \(\varDelta \), where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and \(\varDelta \). To calculate the cumulative entropy \(\lambda \) of rewards in \(\langle o, a, \cdot \rangle \), we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining \(\varDelta \) and OC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pajarinen, J., Lundell, J., Kyrki, V.: POMDP planning under object composition uncertainty: Application to robotic manipulation. IEEE Trans. Robotics (2022)
Zhang, C., et al.: Traffic Mirror-Aware POMDP behavior planning for autonomous urban driving. In: 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE (2022)
Singh, G., Roy, R.N., Chanel, C.P.C.: Pomdp-based adaptive interaction through physiological computing (2022)
Chadès, I., Pascal, L.V., Nicol, S., Fletcher, C.S., Ferrer-Mestres, J.: A primer on partially observable Markov decision processes (POMDPs). Methods Ecol. Evol. 12(11), 2058–2072 (2021)
Åström, K.J.: Optimal control of Markov processes with incomplete state information. J. Math. Anal. Appl. 10(1), 174–205 (1965)
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Cassandra, A.R.: The POMDP Page (2003). https://www.pomdp.org/. (Accessed 10 Dec 2022)
Chadès, I., McDonald-Madden, E., McCarthy, M.A., Wintle, B., Linkie, M., Possingham, H.P.: When to stop managing or surveying cryptic threatened species. Proc. Natl. Acad. Sci. 105(37), 13936–13940 (2008)
Chadès, I., Martin, T.G., Nicol, S., Burgman, M.A., Possingham, H.P., Buckley, Y.M.: General rules for managing and surveying networks of pests, diseases, and endangered species. Proc. Natl. Acad. Sci. 108(20), 8323–8328 (2011)
Chen, X., et al.: Flow-based recurrent belief state learning for pomdps. In: International Conference on Machine Learning. PMLR (2022)
Biehl, M., Virgo, N.: Interpreting systems as solving POMDPs: a step towards a formal understanding of agency. arXiv preprint arXiv:2209.01619 (2022)
Kavaklioglu, C., Helmeczi, R., Cevik, M.: Linear programming-based solution methods for constrained POMDPs. arXiv preprint arXiv:2206.14081 (2022)
Yang, Y., Chen, J., Li, S.: Learning POMDP models with similarity space regularization: a linear gaussian case study. In: Learning for Dynamics and Control Conference. PMLR (2022)
Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Point-based planning for multi-objective POMDPs. In: Proceedings of the Twenty-fourth International Joint Conference On Artificial Intelligence (IJCAI) (2015)
Demin, V.: Cliff walking problem (2009)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Meuth, R.J.: A Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds
Sutton, R.S., Barto, A.G., et al.: Introduction to reinforcement learning (1998)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica 17(2), 229–235 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Yu, Y., Ma, Y., Liu, Y., Wong, D., Lei, K., Egas-López, J.V. (2023). Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 675. Springer, Cham. https://doi.org/10.1007/978-3-031-34111-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-34111-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34110-6
Online ISBN: 978-3-031-34111-3
eBook Packages: Computer ScienceComputer Science (R0)