Abstract
Partially observable Markov decision processes (POMDPs) provide a principled framework for modeling an agent’s decision-making problem when the agent needs to consider noisy state estimates. POMDP policies take into account an action’s influence on the environment as well as the potential information gain. This is a crucial feature for robotic agents which generally have to consider the effect of actions on sensing. However, building POMDP models which reward information gain directly is not straightforward, but is important in domains such as robot-assisted surveillance in which the value of information is hard to quantify. Common techniques for uncertainty reduction such as expected entropy minimization lead to non-standard POMDPs that are hard to solve. We present the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature. By remaining in the standard POMDP setting we can exploit many known results as well as successful approximate algorithms. We demonstrate our ideas in a toy problem as well as in real robot-assisted surveillance, showcasing their use for active cooperative perception scenarios. Finally, our experiments show that the POMDP-IR framework compares favorably with a related approach on benchmark domains.
Similar content being viewed by others
Notes
Maintaining a factorized but exact belief state is typically not desirable for reasons of tractability, but bounded approximations are possible [9] in which the exact belief \(b\) is approximated by the product of marginals over individual state factors \(b_i\). This is a common approach in factorized POMDP solving [37].
An alternative option would be to implement the extension described in Sect. 4.4.
References
Agrawal, R., Realff, M. J., & Lee, J. H. (2013). MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces. Computers & Chemical Engineering, 56, 101–113.
Amato, C., Konidaris, G., Cruz, G., Maynor, C. A., How, J. P., & Kaelbling, L. P. (2014). Planning for decentralized control of multiple robots under uncertainty. In ICAPS-14 Workshop on Planning and Robotics.
Araya-López, M. (2013). Des algorithmes presque optimaux pour les problémes de décisione séquentielle à des fins de collecte d’information. PhD thesis, University of Lorraine.
Araya-López, M., Buffet, O., Thomas, V., & Charpillet, F. (2010). A POMDP extension with belief-dependent rewards. In Advances in Neural Information Processing Systems, Vol. 23.
Barbosa, M., Bernardino, A., Figueira, D., Gaspar, J., Gonçalves, N., Lima, P. U., Moreno, P., Pahliani, A., Santos-Victor, J., Spaan, M. T. J., & Sequeira, J. (2009). ISRobotNet: A testbed for sensor and robot network systems. In Proceedings of International Conference on Intelligent Robots and Systems.
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.
Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. InProceedings of International Joint Conference on Artificial Intelligence.
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence.
Boyen, X., & Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of Uncertainty in Artificial Intelligence.
Brunskill, E., Kaelbling, L., Lozano-Perez, T., & Roy, N. (2008). Continuous-state POMDPs with hybrid dynamics. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics.
Burgard, W., Fox, D., & Thrun, S. (1997). Active mobile robot localization by entropy minimization. In Proceedings of the Second Euromicro Workshop on Advanced Mobile Robots.
Candido, S., & Hutchinson, S. (2011). Minimum uncertainty robot navigation using information-guided POMDP planning. In Proceedings of the International Conference on Robotics and Automation.
Capitán, J., Spaan, M. T. J., Merino, L., & Ollero, A. (2013). Decentralized multi-robot cooperation with auctioned POMDPs. International Journal of Robotics Research, 32(6), 650–671.
Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Eck, A., & Soh, L.-K. (2012). Evaluating POMDP rewards for active perception. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2005). Game theoretic control for robot teams. In Proceedings of the International Conference on Robotics and Automation.
Fern, A., Natarajan, S., Judah, K., & Tadepalli, P. (2007). A decision-theoretic model of assistance. In Proceedings of the International Conference on Artificial Intelligence.
Guestrin, C., Koller, D., & Parr, R. (2001). Solving factored POMDPs with linear value functions. In IJCAI-01 Workshop on Planning under Uncertainty and Incomplete Information.
Guo, A. (2003). Decision-theoretic active sensing for autonomous agents. In Proceedings of the International Conference on Computational Intelligence, Robotics and Autonomous Systems.
Hansen, E. A., & Feng, Z. (2000). Dynamic programming for POMDPs using a factored state representation. In International Conference on Artificial Intelligence Planning and Scheduling.
Hsu, D., Lee, W., & Rong, N. (2008). A point-based POMDP planner for target tracking. In Proceedings of the International Conference on Robotics and Automation.
Ji, S., Parr, R., & Carin, L. (2007). Non-myopic multi-aspect sensing with partially observable Markov decision processes. IEEE Transactions on Signal Processing, 55(6), 2720–2730.
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence.
Krause, A., Leskovec, J., Guestrin, C., Vanbriesen, J., & Faloutsos, C. (2008). Efficient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management, 134(6), 516–526.
Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9, 235–284.
Krishnamurthy, V., & Djonin, D. (2007). Structured threshold policies for dynamic sensor scheduling—A partially observed Markov decision process approach. IEEE Transactions on Signal Processing, 55(10), 4938–4957.
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In International Conference on Machine Learning.
Martinez-Cantin, R., de Freitas, N., Brochu, E., Castellanos, J., & Doucet, A. (2009). A Bayesian exploration–exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27, 93–103.
Merino, L., Ballesteros, J., Pérez-Higueras, N., Ramón-Vigo, R., Pérez-Lara, J., & Caballero, F. (2014). Robust person guidance by using online POMDPs. In ROBOT2013: First Iberian Robotics Conference. Advances in intelligent systems and computing (Vol. 253). Springer.
Mihaylova, L., Lefebvre, T., Bruyninckx, H., Gadeyne, K., & De Schutter, J. (2003). A comparison of decision making criteria and optimization methods for active robotic sensing. In Numerical methods and applications. LNCS (Vol. 2543). Springer.
Natarajan, P., Hoang, T. N., Low, K. H., & Kankanhalli, M. (2012). Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Oliehoek, F. A., & Spaan, M. T. J. (2012). Tree-based pruning for multiagent POMDPs with delayed communication. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.
Oliehoek, F. A., Spaan, M. T. J., & Vlassis, N. (2008). Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32, 289–353.
Pineau, J., Montemerlo, M., Pollack, M., Roy, N., & Thrun, S. (2003). Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42(3–4), 271–281.
Porta, J. M., Vlassis, N., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.
Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, University of Toronto.
Pynadath, D. V., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389–423.
Roijers, D., Vamplew, P., Whiteson, S., & Dazeley, R. (2013). A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48, 67–113.
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 664–704.
Roy, N., Burgard, W., Fox, D., & Thrun, S. (1999). Coastal navigation—Mobile robot navigation with uncertainty in dynamic environments. In Proceedings of the International Conference on Robotics and Automation.
Roy, N., Gordon, G., & Thrun, S. (2003). Planning under uncertainty for reliable health care robotics. In Proceedings of the International Conference on Field and Service Robotics.
Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Sanfeliu, A., Andrade-Cetto, J., Barbosa, M., Bowden, R., Capitán, J., Corominas, A., et al. (2010). Decentralized sensor fusion for ubiquitous networking robotics in urban areas. Sensors, 10(3), 2274–2314.
Scharpff, J., Spaan, M. T. J., Volker, L., & de Weerdt, M. M. (2013). Planning under uncertainty for coordinating infrastructural maintenance. In Proceedings of the International Conference on Automated Planning and Scheduling.
Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, Vol. 23.
Simmons, R., & Koenig, S. (1995). Probabilistic robot navigation in partially observable environments. In Proceedings of the International Joint Conference on Artificial Intelligence.
Singh, S. S., Kantas, N., Vo, B.-N., Doucet, A., & Evans, R. J. (2007). Simulation-based optimal sensor scheduling with application to observer trajectory planning. Automatica, 43(5), 817–830.
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Proceedings of Uncertainty in Artificial Intelligence.
Spaan, M. T. J. (2008). Cooperative active perception using POMDPs. In AAAI 2008 Workshop on Advancements in POMDP Solvers.
Spaan, M. T. J. (2012). Partially observable Markov decision processes. In M. Wiering & M. van Otterlo (Eds.), Reinforcement learning: State of the art. Berlin: Springer.
Spaan, M. T. J., & Lima, P. U. (2009). A decision-theoretic approach to dynamic sensor selection in camera networks. In Proceedings of International Conference on Automated Planning and Scheduling.
Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195–220.
Spaan, M. T. J., Oliehoek, F. A., & Vlassis, N. (2008). Multiagent planning under uncertainty with stochastic communication delays. In Proceedings of International Conference on Automated Planning and Scheduling.
Spaan, M. T. J., Veiga, T. S., & Lima, P. U. (2010). Active cooperative perception in network robot systems using POMDPs. In Proceedings of International Conference on Intelligent Robots and Systems.
Stachniss, C., Grisetti, G., & Burgard, W. (2005). Information gain-based exploration using Rao-Blackwellized particle filters. In Proceedings of Robotics: Science and Systems.
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge: MIT Press.
Veiga, T. S., Spaan, M. T. J., & Lima, P. U. (2014). Point-based POMDP solving with factored value function approximation. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.
Velez, J., Hemann, G., Huang, A. S., Posner, I., & Roy, N. (2011). Planning to perceive: Exploiting mobility for robust object detection. In Proceedings of International Conference on Automated Planning and Scheduling.
Vlassis, N., Gordon, G., & Pineau, J. (2006). Planning under uncertainty in robotics. Robotics and Autonomous Systems, 54(11). Special issue.
Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21(2), 393–422.
Zhang, S., & Sridharan, M. (2012). Active visual sensing and collaboration on mobile robots using hierarchical POMDPs. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Acknowledgments
This work was partially supported by Fundação para a Ciência e a Tecnologia (FCT) through grant SFRH/BD/70559/2010 (T.V.), as well as by FCT ISR/LARSyS strategic funding PEst-OE/EEI/LA0009/2013, and the FCT project CMU-PT/SIA/0023/2009 under the Carnegie Mellon-Portugal Program. We thank Shimon Whiteson for useful discussions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Spaan, M.T.J., Veiga, T.S. & Lima, P.U. Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Auton Agent Multi-Agent Syst 29, 1157–1185 (2015). https://doi.org/10.1007/s10458-014-9279-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-014-9279-8