Learning state-variable relationships for improving POMCP performance

Authors:
Maddalena Zuccotto

University of Verona, Verona, Italy

University of Verona, Verona, Italy
View Profile

,
Alberto Castellini

University of Verona, Verona, Italy

University of Verona, Verona, Italy
View Profile

,
Alessandro Farinelli

University of Verona, Verona, Italy

University of Verona, Verona, Italy
View Profile

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied ComputingApril 2022Pages 739–747https://doi.org/10.1145/3477314.3507049

Published:06 May 2022Publication History

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Pages 739–747

ABSTRACT

We address the problem of learning state-variable relationships across different episodes in Partially Observable Markov Decision Processes (POMDPs) to improve planning performance. Specifically, we focus on Partially Observable Monte Carlo Planning (POMCP) and we represent the acquired knowledge with Markov Random Fields (MRFs). We propose three different methods to compute MRF parameters while the agent acts in the environment. Our techniques acquire information from agent action outcomes, and from the belief of the agent, which summarizes the knowledge acquired from observations. We also propose a stopping criterion to determine when the MRF is accurate enough and the learning process can be stopped. Results show that the proposed approach allows to effectively learn state-variable probabilistic constraints and to outperform standard POMCP with no computational overhead.

References

Pieter Abbeel, Daphne Koller, and Andrew Y. Ng. 2006. Learning Factor Graphs in Polynomial Time and Sample Complexity. Journal of Machine Learning Research 7 (2006), 1743--1788.Google ScholarDigital Library
Amir Aczel and Jayavel Sounderpandian. 2008. Complete Business Statistics. McGraw-Hill.Google Scholar
Christopher Amato and Frans A. Oliehoek. 2015. Scalable Planning and Learning for Multiagent POMDPs. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 1995--2002.Google Scholar
Julian Besag. 1977. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64, 3 (12 1977), 616--618. Google ScholarCross Ref
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.Google ScholarDigital Library
Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1--43. Google ScholarCross Ref
Alberto Castellini, Georgios Chalkiadakis, and Alessandro Farinelli. 2019. Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. ijcai.org, 5540--5546. Google ScholarCross Ref
Alberto Castellini, Enrico Marchesini, and Alessandro Farinelli. 2021. Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation. Engineering Applications of Artificial Intelligence 104 (2021), 104382. Google ScholarCross Ref
Alberto Castellini, Enrico Marchesini, Giulio Mazzi, and Alessandro Farinelli. 2020. Explaining the Influence of Prior Knowledge on POMCP Policies. In Multi-Agent Systems and Agreement Technologies - 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14--15, 2020, Revised Selected Papers (Lecture Notes in Computer Science), Vol. 12520. Springer, 261--276. Google ScholarDigital Library
Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101, 1--2 (May 1998), 99--134. Google ScholarDigital Library
Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2017. Learning in POMDPs with Monte Carlo Tree Search. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML'17). JMLR.org, 1819--1827.Google Scholar
Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2019. Bayesian Reinforcement Learning in Factored POMDPs. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS'19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 7--15.Google ScholarDigital Library
Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. In Proceedings of the 17th European Conference on Machine Learning (ECML'06). Springer-Verlag, Berlin, Heidelberg, 282--293. Google ScholarDigital Library
Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim. 2018. Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2018. 7934--7943.Google Scholar
Douglas C. Montgomery and George C. Runger. 2010. Applied statistics and probability for engineers. John Wiley & Sons.Google Scholar
Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarDigital Library
Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes. Mathematics of Operations Research 12, 3 (1987), 441--450.Google ScholarDigital Library
Patrick Pletscher, Cheng Soon Ong, and Joachim Buhmann. 2009. Spanning Tree Approximations for Conditional Random Fields. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009 (Proceedings of Machine Learning Research), Vol. 5. PMLR, 408--415.Google Scholar
Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, and Pierre Kreitmann. 2011. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research 12 (2011), 1729--1770.Google ScholarDigital Library
Russ R. Salakhutdinov. 2009. Learning in Markov Random Fields using Tempered Transitions. In Advances in Neural Information Processing Systems, NeurIPS 2009.Google Scholar
Abhin Shah, Devavrat Shah, and Gregory W. Wornell. 2021. On Learning Continuous Pairwise Markov Random Fields. In The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, Virtual Event (Proceedings of Machine Learning Research), Vol. 130. PMLR, 1153--1161.Google Scholar
David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2010. 2164--2172.Google Scholar
Trey Smith and Reid Simmons. 2004. Heuristic Search Value Iteration for POMDPs. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI '04). AUAI Press, 520--527.Google ScholarDigital Library
Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs. Operations Research 26, 2 (April 1978), 282--304. Google ScholarDigital Library
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press.Google ScholarDigital Library
Graham Upton and Ian Cook. 2008. A Dictionary of Statistics. OUP Oxford.Google Scholar
Marc Vuffray, Sidhant Misra, and Andrey Y. Lokhov. 2020. Efficient Learning of Discrete Graphical Models. In Advances in Neural Information Processing Systems, NeurIPS 2020.Google Scholar

Index Terms

Learning state-variable relationships for improving POMCP performance
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning under uncertainty
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
    2. Machine learning approaches
      1. Partially-observable Markov decision processes

Recommendations

Learning Logic Specifications for Soft Policy Guidance in POMCP
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a ...
Read More
Explaining the Influence of Prior Knowledge on POMCP Policies
Multi-Agent Systems and Agreement Technologies
Abstract
Partially Observable Monte Carlo Planning is a recently proposed online planning algorithm which makes use of Monte Carlo Tree Search to solve Partially Observable Monte Carlo Decision Processes. This solver is very successful because of its ...
Read More
Planning for multiple measurement channels in a continuous-state POMDP

Continuous-state partially observable Markov decision processes (POMDPs) are an intuitive choice of representation for many stochastic planning problems with a hidden state. We consider a continuous-state POMDPs with finite action and observation spaces,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Conference Chairs:
Jiman Hong
Soongsil University
,
Miroslav Bures
Czech Technical University, Czechia
,
Program Chairs:
Juw Won Park
University of Louisville
,
Tomas Cerny
Baylor University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 May 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
POMCP
markov random fields
planning and learning
planning under uncertainty
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 102
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning state-variable relationships for improving POMCP performance

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning Logic Specifications for Soft Policy Guidance in POMCP

Explaining the Influence of Prior Knowledge on POMCP Policies

Planning for multiple measurement channels in a continuous-state POMDP