skip to main content
10.1145/3477314.3507049acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article
Open Access

Learning state-variable relationships for improving POMCP performance

Published:06 May 2022Publication History

ABSTRACT

We address the problem of learning state-variable relationships across different episodes in Partially Observable Markov Decision Processes (POMDPs) to improve planning performance. Specifically, we focus on Partially Observable Monte Carlo Planning (POMCP) and we represent the acquired knowledge with Markov Random Fields (MRFs). We propose three different methods to compute MRF parameters while the agent acts in the environment. Our techniques acquire information from agent action outcomes, and from the belief of the agent, which summarizes the knowledge acquired from observations. We also propose a stopping criterion to determine when the MRF is accurate enough and the learning process can be stopped. Results show that the proposed approach allows to effectively learn state-variable probabilistic constraints and to outperform standard POMCP with no computational overhead.

References

  1. Pieter Abbeel, Daphne Koller, and Andrew Y. Ng. 2006. Learning Factor Graphs in Polynomial Time and Sample Complexity. Journal of Machine Learning Research 7 (2006), 1743--1788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amir Aczel and Jayavel Sounderpandian. 2008. Complete Business Statistics. McGraw-Hill.Google ScholarGoogle Scholar
  3. Christopher Amato and Frans A. Oliehoek. 2015. Scalable Planning and Learning for Multiagent POMDPs. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 1995--2002.Google ScholarGoogle Scholar
  4. Julian Besag. 1977. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika 64, 3 (12 1977), 616--618. Google ScholarGoogle ScholarCross RefCross Ref
  5. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1--43. Google ScholarGoogle ScholarCross RefCross Ref
  7. Alberto Castellini, Georgios Chalkiadakis, and Alessandro Farinelli. 2019. Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. ijcai.org, 5540--5546. Google ScholarGoogle ScholarCross RefCross Ref
  8. Alberto Castellini, Enrico Marchesini, and Alessandro Farinelli. 2021. Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation. Engineering Applications of Artificial Intelligence 104 (2021), 104382. Google ScholarGoogle ScholarCross RefCross Ref
  9. Alberto Castellini, Enrico Marchesini, Giulio Mazzi, and Alessandro Farinelli. 2020. Explaining the Influence of Prior Knowledge on POMCP Policies. In Multi-Agent Systems and Agreement Technologies - 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14--15, 2020, Revised Selected Papers (Lecture Notes in Computer Science), Vol. 12520. Springer, 261--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101, 1--2 (May 1998), 99--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2017. Learning in POMDPs with Monte Carlo Tree Search. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML'17). JMLR.org, 1819--1827.Google ScholarGoogle Scholar
  13. Sammie Katt, Frans A. Oliehoek, and Christopher Amato. 2019. Bayesian Reinforcement Learning in Factored POMDPs. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS'19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 7--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. In Proceedings of the 17th European Conference on Machine Learning (ECML'06). Springer-Verlag, Berlin, Heidelberg, 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim. 2018. Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2018. 7934--7943.Google ScholarGoogle Scholar
  16. Douglas C. Montgomery and George C. Runger. 2010. Applied statistics and probability for engineers. John Wiley & Sons.Google ScholarGoogle Scholar
  17. Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes. Mathematics of Operations Research 12, 3 (1987), 441--450.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Patrick Pletscher, Cheng Soon Ong, and Joachim Buhmann. 2009. Spanning Tree Approximations for Conditional Random Fields. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009 (Proceedings of Machine Learning Research), Vol. 5. PMLR, 408--415.Google ScholarGoogle Scholar
  20. Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, and Pierre Kreitmann. 2011. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research 12 (2011), 1729--1770.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Russ R. Salakhutdinov. 2009. Learning in Markov Random Fields using Tempered Transitions. In Advances in Neural Information Processing Systems, NeurIPS 2009.Google ScholarGoogle Scholar
  22. Abhin Shah, Devavrat Shah, and Gregory W. Wornell. 2021. On Learning Continuous Pairwise Markov Random Fields. In The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, Virtual Event (Proceedings of Machine Learning Research), Vol. 130. PMLR, 1153--1161.Google ScholarGoogle Scholar
  23. David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems, NeurIPS 2010. 2164--2172.Google ScholarGoogle Scholar
  24. Trey Smith and Reid Simmons. 2004. Heuristic Search Value Iteration for POMDPs. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI '04). AUAI Press, 520--527.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Edward J. Sondik. 1978. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs. Operations Research 26, 2 (April 1978), 282--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Graham Upton and Ian Cook. 2008. A Dictionary of Statistics. OUP Oxford.Google ScholarGoogle Scholar
  28. Marc Vuffray, Sidhant Misra, and Andrey Y. Lokhov. 2020. Efficient Learning of Discrete Graphical Models. In Advances in Neural Information Processing Systems, NeurIPS 2020.Google ScholarGoogle Scholar

Index Terms

  1. Learning state-variable relationships for improving POMCP performance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
          April 2022
          2099 pages
          ISBN:9781450387132
          DOI:10.1145/3477314

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 May 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,650of6,669submissions,25%
        • Article Metrics

          • Downloads (Last 12 months)59
          • Downloads (Last 6 weeks)7

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader