Decision-theoretic planning under uncertainty with information rewards for active cooperative perception

Spaan, Matthijs T. J.; Veiga, Tiago S.; Lima, Pedro U.

doi:10.1007/s10458-014-9279-8

Decision-theoretic planning under uncertainty with information rewards for active cooperative perception

Published: 23 December 2014

Volume 29, pages 1157–1185, (2015)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Matthijs T. J. Spaan¹,
Tiago S. Veiga² &
Pedro U. Lima²

1634 Accesses
36 Citations
Explore all metrics

Abstract

Partially observable Markov decision processes (POMDPs) provide a principled framework for modeling an agent’s decision-making problem when the agent needs to consider noisy state estimates. POMDP policies take into account an action’s influence on the environment as well as the potential information gain. This is a crucial feature for robotic agents which generally have to consider the effect of actions on sensing. However, building POMDP models which reward information gain directly is not straightforward, but is important in domains such as robot-assisted surveillance in which the value of information is hard to quantify. Common techniques for uncertainty reduction such as expected entropy minimization lead to non-standard POMDPs that are hard to solve. We present the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature. By remaining in the standard POMDP setting we can exploit many known results as well as successful approximate algorithms. We demonstrate our ideas in a toy problem as well as in real robot-assisted surveillance, showcasing their use for active cooperative perception scenarios. Finally, our experiments show that the POMDP-IR framework compares favorably with a related approach on benchmark domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

Notes

Maintaining a factorized but exact belief state is typically not desirable for reasons of tractability, but bounded approximations are possible [9] in which the exact belief \(b\) is approximated by the product of marginals over individual state factors \(b_i\). This is a common approach in factorized POMDP solving [37].
An alternative option would be to implement the extension described in Sect. 4.4.

References

Agrawal, R., Realff, M. J., & Lee, J. H. (2013). MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces. Computers & Chemical Engineering, 56, 101–113.
Article Google Scholar
Amato, C., Konidaris, G., Cruz, G., Maynor, C. A., How, J. P., & Kaelbling, L. P. (2014). Planning for decentralized control of multiple robots under uncertainty. In ICAPS-14 Workshop on Planning and Robotics.
Araya-López, M. (2013). Des algorithmes presque optimaux pour les problémes de décisione séquentielle à des fins de collecte d’information. PhD thesis, University of Lorraine.
Araya-López, M., Buffet, O., Thomas, V., & Charpillet, F. (2010). A POMDP extension with belief-dependent rewards. In Advances in Neural Information Processing Systems, Vol. 23.
Barbosa, M., Bernardino, A., Figueira, D., Gaspar, J., Gonçalves, N., Lima, P. U., Moreno, P., Pahliani, A., Santos-Victor, J., Spaan, M. T. J., & Sequeira, J. (2009). ISRobotNet: A testbed for sensor and robot network systems. In Proceedings of International Conference on Intelligent Robots and Systems.
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.
Article MathSciNet MATH Google Scholar
Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. InProceedings of International Joint Conference on Artificial Intelligence.
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence.
Boyen, X., & Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of Uncertainty in Artificial Intelligence.
Brunskill, E., Kaelbling, L., Lozano-Perez, T., & Roy, N. (2008). Continuous-state POMDPs with hybrid dynamics. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics.
Burgard, W., Fox, D., & Thrun, S. (1997). Active mobile robot localization by entropy minimization. In Proceedings of the Second Euromicro Workshop on Advanced Mobile Robots.
Candido, S., & Hutchinson, S. (2011). Minimum uncertainty robot navigation using information-guided POMDP planning. In Proceedings of the International Conference on Robotics and Automation.
Capitán, J., Spaan, M. T. J., Merino, L., & Ollero, A. (2013). Decentralized multi-robot cooperation with auctioned POMDPs. International Journal of Robotics Research, 32(6), 650–671.
Article Google Scholar
Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Eck, A., & Soh, L.-K. (2012). Evaluating POMDP rewards for active perception. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2005). Game theoretic control for robot teams. In Proceedings of the International Conference on Robotics and Automation.
Fern, A., Natarajan, S., Judah, K., & Tadepalli, P. (2007). A decision-theoretic model of assistance. In Proceedings of the International Conference on Artificial Intelligence.
Guestrin, C., Koller, D., & Parr, R. (2001). Solving factored POMDPs with linear value functions. In IJCAI-01 Workshop on Planning under Uncertainty and Incomplete Information.
Guo, A. (2003). Decision-theoretic active sensing for autonomous agents. In Proceedings of the International Conference on Computational Intelligence, Robotics and Autonomous Systems.
Hansen, E. A., & Feng, Z. (2000). Dynamic programming for POMDPs using a factored state representation. In International Conference on Artificial Intelligence Planning and Scheduling.
Hsu, D., Lee, W., & Rong, N. (2008). A point-based POMDP planner for target tracking. In Proceedings of the International Conference on Robotics and Automation.
Ji, S., Parr, R., & Carin, L. (2007). Non-myopic multi-aspect sensing with partially observable Markov decision processes. IEEE Transactions on Signal Processing, 55(6), 2720–2730.
Article MathSciNet Google Scholar
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Article MathSciNet MATH Google Scholar
Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence.
Krause, A., Leskovec, J., Guestrin, C., Vanbriesen, J., & Faloutsos, C. (2008). Efficient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management, 134(6), 516–526.
Article Google Scholar
Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9, 235–284.
MATH Google Scholar
Krishnamurthy, V., & Djonin, D. (2007). Structured threshold policies for dynamic sensor scheduling—A partially observed Markov decision process approach. IEEE Transactions on Signal Processing, 55(10), 4938–4957.
Article MathSciNet Google Scholar
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In International Conference on Machine Learning.
Martinez-Cantin, R., de Freitas, N., Brochu, E., Castellanos, J., & Doucet, A. (2009). A Bayesian exploration–exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27, 93–103.
Article Google Scholar
Merino, L., Ballesteros, J., Pérez-Higueras, N., Ramón-Vigo, R., Pérez-Lara, J., & Caballero, F. (2014). Robust person guidance by using online POMDPs. In ROBOT2013: First Iberian Robotics Conference. Advances in intelligent systems and computing (Vol. 253). Springer.
Mihaylova, L., Lefebvre, T., Bruyninckx, H., Gadeyne, K., & De Schutter, J. (2003). A comparison of decision making criteria and optimization methods for active robotic sensing. In Numerical methods and applications. LNCS (Vol. 2543). Springer.
Natarajan, P., Hoang, T. N., Low, K. H., & Kankanhalli, M. (2012). Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.
Oliehoek, F. A., & Spaan, M. T. J. (2012). Tree-based pruning for multiagent POMDPs with delayed communication. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.
Oliehoek, F. A., Spaan, M. T. J., & Vlassis, N. (2008). Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32, 289–353.
MathSciNet MATH Google Scholar
Pineau, J., Montemerlo, M., Pollack, M., Roy, N., & Thrun, S. (2003). Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42(3–4), 271–281.
Article MATH Google Scholar
Porta, J. M., Vlassis, N., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.
MathSciNet MATH Google Scholar
Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, University of Toronto.
Pynadath, D. V., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389–423.
MathSciNet MATH Google Scholar
Roijers, D., Vamplew, P., Whiteson, S., & Dazeley, R. (2013). A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48, 67–113.
MathSciNet MATH Google Scholar
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 664–704.
MathSciNet Google Scholar
Roy, N., Burgard, W., Fox, D., & Thrun, S. (1999). Coastal navigation—Mobile robot navigation with uncertainty in dynamic environments. In Proceedings of the International Conference on Robotics and Automation.
Roy, N., Gordon, G., & Thrun, S. (2003). Planning under uncertainty for reliable health care robotics. In Proceedings of the International Conference on Field and Service Robotics.
Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Article MATH Google Scholar
Sanfeliu, A., Andrade-Cetto, J., Barbosa, M., Bowden, R., Capitán, J., Corominas, A., et al. (2010). Decentralized sensor fusion for ubiquitous networking robotics in urban areas. Sensors, 10(3), 2274–2314.
Article Google Scholar
Scharpff, J., Spaan, M. T. J., Volker, L., & de Weerdt, M. M. (2013). Planning under uncertainty for coordinating infrastructural maintenance. In Proceedings of the International Conference on Automated Planning and Scheduling.
Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, Vol. 23.
Simmons, R., & Koenig, S. (1995). Probabilistic robot navigation in partially observable environments. In Proceedings of the International Joint Conference on Artificial Intelligence.
Singh, S. S., Kantas, N., Vo, B.-N., Doucet, A., & Evans, R. J. (2007). Simulation-based optimal sensor scheduling with application to observer trajectory planning. Automatica, 43(5), 817–830.
Article MathSciNet MATH Google Scholar
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Proceedings of Uncertainty in Artificial Intelligence.
Spaan, M. T. J. (2008). Cooperative active perception using POMDPs. In AAAI 2008 Workshop on Advancements in POMDP Solvers.
Spaan, M. T. J. (2012). Partially observable Markov decision processes. In M. Wiering & M. van Otterlo (Eds.), Reinforcement learning: State of the art. Berlin: Springer.
Google Scholar
Spaan, M. T. J., & Lima, P. U. (2009). A decision-theoretic approach to dynamic sensor selection in camera networks. In Proceedings of International Conference on Automated Planning and Scheduling.
Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195–220.
MATH Google Scholar
Spaan, M. T. J., Oliehoek, F. A., & Vlassis, N. (2008). Multiagent planning under uncertainty with stochastic communication delays. In Proceedings of International Conference on Automated Planning and Scheduling.
Spaan, M. T. J., Veiga, T. S., & Lima, P. U. (2010). Active cooperative perception in network robot systems using POMDPs. In Proceedings of International Conference on Intelligent Robots and Systems.
Stachniss, C., Grisetti, G., & Burgard, W. (2005). Information gain-based exploration using Rao-Blackwellized particle filters. In Proceedings of Robotics: Science and Systems.
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge: MIT Press.
MATH Google Scholar
Veiga, T. S., Spaan, M. T. J., & Lima, P. U. (2014). Point-based POMDP solving with factored value function approximation. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.
Velez, J., Hemann, G., Huang, A. S., Posner, I., & Roy, N. (2011). Planning to perceive: Exploiting mobility for robust object detection. In Proceedings of International Conference on Automated Planning and Scheduling.
Vlassis, N., Gordon, G., & Pineau, J. (2006). Planning under uncertainty in robotics. Robotics and Autonomous Systems, 54(11). Special issue.
Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21(2), 393–422.
Article Google Scholar
Zhang, S., & Sridharan, M. (2012). Active visual sensing and collaboration on mobile robots using hierarchical POMDPs. In Proceedings of International Conference on Autonomous Agents and Multi Agent Systems.

Download references

Acknowledgments

This work was partially supported by Fundação para a Ciência e a Tecnologia (FCT) through grant SFRH/BD/70559/2010 (T.V.), as well as by FCT ISR/LARSyS strategic funding PEst-OE/EEI/LA0009/2013, and the FCT project CMU-PT/SIA/0023/2009 under the Carnegie Mellon-Portugal Program. We thank Shimon Whiteson for useful discussions.

Author information

Authors and Affiliations

Delft University of Technology, Delft, The Netherlands
Matthijs T. J. Spaan
Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Tiago S. Veiga & Pedro U. Lima

Authors

Matthijs T. J. Spaan
View author publications
You can also search for this author in PubMed Google Scholar
Tiago S. Veiga
View author publications
You can also search for this author in PubMed Google Scholar
Pedro U. Lima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthijs T. J. Spaan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spaan, M.T.J., Veiga, T.S. & Lima, P.U. Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Auton Agent Multi-Agent Syst 29, 1157–1185 (2015). https://doi.org/10.1007/s10458-014-9279-8

Download citation

Published: 23 December 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10458-014-9279-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision-theoretic planning under uncertainty with information rewards for active cooperative perception

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

A practical guide to multi-objective reinforcement learning and planning

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decision-theoretic planning under uncertainty with information rewards for active cooperative perception

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

A practical guide to multi-objective reinforcement learning and planning

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation