ABSTRACT
We address the problem of learning state-variable relationships across different episodes in Partially Observable Markov Decision Processes (POMDPs) to improve planning performance. Specifically, we focus on Partially Observable Monte Carlo Planning (POMCP) and we represent the acquired knowledge with Markov Random Fields (MRFs). We propose three different methods to compute MRF parameters while the agent acts in the environment. Our techniques acquire information from agent action outcomes, and from the belief of the agent, which summarizes the knowledge acquired from observations. We also propose a stopping criterion to determine when the MRF is accurate enough and the learning process can be stopped. Results show that the proposed approach allows to effectively learn state-variable probabilistic constraints and to outperform standard POMCP with no computational overhead.
- Pieter Abbeel, Daphne Koller, and Andrew Y. Ng. 2006. Learning Factor Graphs in Polynomial Time and Sample Complexity. Journal of Machine Learning Research 7 (2006), 1743--1788.Google ScholarDigital Library
- Amir Aczel and Jayavel Sounderpandian. 2008. Complete Business Statistics. McGraw-Hill.Google Scholar
- Christopher Amato and Frans A. Oliehoek. 2015. Scalable Planning and Learning for Multiagent POMDPs. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 1995--2002.Google Scholar
- Julian Besag. 1977. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64, 3 (12 1977), 616--618. Google ScholarCross Ref
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.Google ScholarDigital Library
- Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1--43. Google ScholarCross Ref
- Alberto Castellini, Georgios Chalkiadakis, and Alessandro Farinelli. 2019. Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. ijcai.org, 5540--5546. Google ScholarCross Ref
- Alberto Castellini, Enrico Marchesini, and Alessandro Farinelli. 2021. Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation. Engineering Applications of Artificial Intelligence 104 (2021), 104382. Google ScholarCross Ref
- Alberto Castellini, Enrico Marchesini, Giulio Mazzi, and Alessandro Farinelli. 2020. Explaining the Influence of Prior Knowledge on POMCP Policies. In Multi-Agent Systems and Agreement Technologies - 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14--15, 2020, Revised Selected Papers (Lecture Notes in Computer Science), Vol. 12520. Springer, 261--276. Google ScholarDigital Library
- Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
- Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101, 1--2 (May 1998), 99--134. Google ScholarDigital Library
- Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2017. Learning in POMDPs with Monte Carlo Tree Search. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML'17). JMLR.org, 1819--1827.Google Scholar
- Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2019. Bayesian Reinforcement Learning in Factored POMDPs. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS'19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 7--15.Google ScholarDigital Library
- Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. In Proceedings of the 17th European Conference on Machine Learning (ECML'06). Springer-Verlag, Berlin, Heidelberg, 282--293. Google ScholarDigital Library
- Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim. 2018. Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2018. 7934--7943.Google Scholar
- Douglas C. Montgomery and George C. Runger. 2010. Applied statistics and probability for engineers. John Wiley & Sons.Google Scholar
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarDigital Library
- Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes. Mathematics of Operations Research 12, 3 (1987), 441--450.Google ScholarDigital Library
- Patrick Pletscher, Cheng Soon Ong, and Joachim Buhmann. 2009. Spanning Tree Approximations for Conditional Random Fields. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009 (Proceedings of Machine Learning Research), Vol. 5. PMLR, 408--415.Google Scholar
- Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, and Pierre Kreitmann. 2011. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research 12 (2011), 1729--1770.Google ScholarDigital Library
- Russ R. Salakhutdinov. 2009. Learning in Markov Random Fields using Tempered Transitions. In Advances in Neural Information Processing Systems, NeurIPS 2009.Google Scholar
- Abhin Shah, Devavrat Shah, and Gregory W. Wornell. 2021. On Learning Continuous Pairwise Markov Random Fields. In The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, Virtual Event (Proceedings of Machine Learning Research), Vol. 130. PMLR, 1153--1161.Google Scholar
- David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2010. 2164--2172.Google Scholar
- Trey Smith and Reid Simmons. 2004. Heuristic Search Value Iteration for POMDPs. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI '04). AUAI Press, 520--527.Google ScholarDigital Library
- Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs. Operations Research 26, 2 (April 1978), 282--304. Google ScholarDigital Library
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press.Google ScholarDigital Library
- Graham Upton and Ian Cook. 2008. A Dictionary of Statistics. OUP Oxford.Google Scholar
- Marc Vuffray, Sidhant Misra, and Andrey Y. Lokhov. 2020. Efficient Learning of Discrete Graphical Models. In Advances in Neural Information Processing Systems, NeurIPS 2020.Google Scholar
Index Terms
- Learning state-variable relationships for improving POMCP performance
Recommendations
Learning Logic Specifications for Soft Policy Guidance in POMCP
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsPartially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a ...
Explaining the Influence of Prior Knowledge on POMCP Policies
Multi-Agent Systems and Agreement TechnologiesAbstractPartially Observable Monte Carlo Planning is a recently proposed online planning algorithm which makes use of Monte Carlo Tree Search to solve Partially Observable Monte Carlo Decision Processes. This solver is very successful because of its ...
Planning for multiple measurement channels in a continuous-state POMDP
Continuous-state partially observable Markov decision processes (POMDPs) are an intuitive choice of representation for many stochastic planning problems with a hidden state. We consider a continuous-state POMDPs with finite action and observation spaces,...
Comments