Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Chong, Edwin K. P.; Kreucher, Christopher M.; Hero, Alfred O.

doi:10.1007/s10626-009-0071-x

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Published: 28 May 2009

Volume 19, pages 377–422, (2009)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Edwin K. P. Chong¹,
Christopher M. Kreucher² &
Alfred O. Hero III³

1201 Accesses
68 Citations
Explore all metrics

Abstract

Adaptive sensing involves actively managing sensor resources to achieve a sensing task, such as object detection, classification, and tracking, and represents a promising direction for new applications of discrete event system methods. We describe an approach to adaptive sensing based on approximately solving a partially observable Markov decision process (POMDP) formulation of the problem. Such approximations are necessary because of the very large state space involved in practical adaptive sensing problems, precluding exact computation of optimal solutions. We review the theory of POMDPs and show how the theory applies to adaptive sensing problems. We then describe a variety of approximation methods, with examples to illustrate their application in adaptive sensing. The examples also demonstrate the gains that are possible from nonmyopic methods relative to myopic methods, and highlight some insights into the dependence of such gains on the sensing resources and environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Residuals-based distributionally robust optimization with covariate information

Article 26 September 2023

Dynamic Event-triggered Control and Estimation: A Survey

Article Open access 11 June 2021

Notes

For the case where \(\mathcal{S}\) represents target kinematic states in Cartesian coordinates, we typically use the Euclidean norm for this metric.
In fact, given a POMDP, the Q-value can be viewed as the objective function value for a related problem; see Bertsekas and Tsitsiklis (1996).

References

Altman E (1998) Constrained Markov decision processes. Chapman and Hall/CRC, London
Google Scholar
Bartels R, Backus S, Zeek E, Misoguti L, Vdovin G, Christov IP, Murnane MM, Kapteyn HC (2000) Shaped-pulse optimization of coherent soft X-rays. Nature 406:164–166
Article Google Scholar
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Google Scholar
Bertsekas DP (2005) Dynamic programming and suboptimal control: a survey from ADP to MPC. In: Proc. joint 44th IEEE conf. on decision and control and European control conf., Seville, 12–15 December 2005
Bertsekas DP (2007) Dynamic programming and optimal control, vol I, 3rd edn, 2005; vol II, 3rd edn. Athena Scientific, Belmont
Google Scholar
Bertsekas DP, Castanon DA (1999) Rollout algorithms for stochastic scheduling problems. Journal of Heuristics 5:89–108
Article MATH Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Blatt D, Hero AO III (2006a) From weighted classification to policy search. In: Advances in neural information processing systems (NIPS) vol 18, pp 139–146
Blatt D, Hero AO III (2006b) Optimal sensor scheduling via classification reduction of policy search (CROPS). In: Proc. int. conf. on automated planning and scheduling (ICAPS)
Castanon D (1997) Approximate dynamic programming for sensor management. In: Proc. 36th IEEE conf. on decision and control, San Diego, pp 1202–1207
Chang HS, Givan RL, Chong EKP (2004) Parallel rollout for online solution of partially observable Markov decision processes. Discret Event Dyn Syst 14(3):309–341
Article MATH MathSciNet Google Scholar
Chang HS, Fu MC, Hu J, Marcus SI (2007) Simulation-based algorithms for Markov decision processes. Springer series in communications and control engineering. Springer, Berlin Heidelberg New York
MATH Google Scholar
Chen RC, Wagner K (2007) Constrained partially observed Markov decision processes for adaptive waveform scheduling. In: Proc. int. conf. on electromagnetics in advanced applications, Torino, 17–21 September 2007, pp 454–463
Cheng HT (1988) Algorithms for partially observable Markov decision processes. PhD dissertation, University of British Columbia
Chhetri A, Morrell D, Papandreou-Suppappola A (2004) Efficient search strategies for non-myopic sensor scheduling in target tracking. In: Asilomar conf. on signals, systems, and computers
Chong EKP, Givan RL, Chang HS (2000) A framework for simulation-based network control via hindsight optimization. In: Proc. 39th IEEE conf. on decision and control, Sydney, 12–15 December 2000, pp 1433–1438
de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51(6):850–865
Article MathSciNet Google Scholar
de Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math Oper Res 29(3):462–478
Article MATH MathSciNet Google Scholar
Gottlieb E, Harrigan R (2001) The Umbra simulation framework. Sandia Tech Report SAND2001-1533 (Unlimited Release)
He Y, Chong EKP (2004) Sensor scheduling for target tracking in sensor networks. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004, pp 743–748
He Y, Chong EKP (2006) Sensor scheduling for target tracking: a Monte Carlo sampling approach. Digit Signal Process 16(5):533–545
Article Google Scholar
Hero A, Castanon D, Cochran D, Kastella K (eds) (2008) Foundations and applications of sensor management. Springer, Berlin Heidelberg New York
Google Scholar
Ji S, Parr R, Carin L (2007) Nonmyopic multiaspect sensing with partially observable Markov decision processes. IEEE Trans Signal Process 55(6):2720–2730 (Part 1)
Article Google Scholar
Julier S, Uhlmann J (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422
Article Google Scholar
Krakow LW, Li Y, Chong EKP, Groom KN, Harrington J, Rigdon B (2006) Control of perimeter surveillance wireless sensor networks via partially observable Markov decision process. In: Proc. 2006 IEEE int Carnahan conf on security technology (ICCST), Lexington, 17–20 October 2006
Kearns MJ, Mansour Y, Ng AY (1999) A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. 16th int. joint conf. on artificial intelligence, pp 1324–1331
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Google Scholar
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134
Article MATH MathSciNet Google Scholar
Kreucher CM, Hero A, Kastella K (2005a) A comparison of task driven and information driven sensor management for target tracking. In: Proc. 44th IEEE conf. on decision and control (CDC’05), 12–15 December 2005
Kreucher CM, Kastella K, Hero AO III (2005b) Sensor management using an active sensing approach. Signal Process 85(3):607–624
Article MATH Google Scholar
Kreucher CM, Kastella K, Hero AO III (2005c) Multitarget tracking using the joint multitarget probability density. IEEE Trans Aerosp Electron Syst 41(4):1396–1414
Article Google Scholar
Kreucher CM, Blatt D, Hero AO III, Kastella K (2006) Adaptive multi-modality sensor scheduling for detection and tracking of smart targets. Digit Signal Process 16:546–567
Article Google Scholar
Kreucher CM, Hero AO III, Kastella K, Chang D (2004) Efficient methods of non-myopic sensor management for multitarget tracking. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004
Krishnamurthy V (2005) Emission management for low probability intercept sensors in network centric warfare. IEEE Trans Aerosp Electron Syst 41(1):133–151
Article Google Scholar
Krishnamurthy V, Evans RJ (2001) Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans Signal Process 49(12):2893–2908
Article Google Scholar
Li Y, Krakow LW, Chong EKP, Groom KN (2006) Dynamic sensor management for multisensor multitarget tracking. In: Proc. 40th annual conf. on information sciences and systems, Princeton, 22–24 March 2006, pp 1397–1402
Li Y, Krakow LW, Chong EKP, Groom KN (2007) Approximate stochastic dynamic programming for sensor scheduling to track multiple targets. Digit Signal Process. doi:10.1016/j.dsp.2007.05.004
Google Scholar
Lovejoy WS (1991a) Computationally feasible bounds for partially observed Markov decision processes. Oper Res 39:162–175
Article MATH MathSciNet Google Scholar
Lovejoy WS (1991b) A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28(1):47–65
Article MATH MathSciNet Google Scholar
Miller SA, Harris ZA, Chong EKP (2009) A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking. EURASIP J Appl Signal Process (Special Issue on Signal Processing Advances in Robots and Autonomy). doi:10.1155/2009/724597
Google Scholar
Pontryagin LS, Boltyansky VG, Gamkrelidze RV, Mishchenko EF (1962) The mathematical theory of optimal processes. Wiley, New York
MATH Google Scholar
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience, New York
Book MATH Google Scholar
Ristic B, Arulampalam S, Gordon N (2004) Beyond the Kalman filter: particle filters for tracking applications. Artech House, Norwood
MATH Google Scholar
Roy N, Gordon G, Thrun S (2005) Finding approximate POMDP solutions through belief compression. J Artif Intell Res 23:1–40
Article MATH Google Scholar
Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516
Article MATH MathSciNet Google Scholar
Scott WR Jr, Kim K, Larson GD, Gurbuz AC, McClellan JH (2004) Combined seismic, radar, and induction sensor for landmine detection. In: Proc. 2004 int. IEEE geoscience and remote sensing symposium, Anchorage, 20–24 September 2004, pp 1613–1616
Shi L, Chen C-H (2000) A new algorithm for stochastic discrete resource allocation optimization. Discret Event Dyn Syst 10:271–294
Article MATH MathSciNet Google Scholar
Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21(5):1071–1088
Article MATH Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning. MIT, Cambridge
Google Scholar
Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT, Cambridge
MATH Google Scholar
Tijms HC (2003) A first course in stochastic models. Wiley, New York
Book MATH Google Scholar
Washburn R, Schneider M, Fox J (2002) Stochastic dynamic programming based approaches to sensor resource management. In: 5th int conf on information fusion
Watkins CJCH (1989) Learning from delayed rewards. PhD dissertation, King’s College, University of Cambridge
Willems JC (1996) 1969: the birth of optimal control. In: Proc. 35th IEEE conf. on decision and control (CDC’96), pp 1586–1587
Wu G, Chong EKP, Givan RL (2002) Burst-level congestion control using hindsight optimization. IEEE Trans Automat Control (Special Issue on Systems and Control Methods for Communication Networks) 47(6):979–991
MathSciNet Google Scholar
Yu H, Bertsekas DP (2004) Discretized approximations for POMDP with average cost. In: Proc. 20th conf. on uncertainty in artificial intelligence, Banff, pp 619–627
Zhang NL, Liu W (1996) Planning in stochastic domains: problem characteristics and approximation. Tech. report HKUST-CS96-31, Dept. of Computer Science, Hong Kong University of Science and Technology
Zhang Z, Moola S, Chong EKP (2008) Approximate stochastic dynamic programming for opportunistic fair scheduling in wireless networks. In: Proc. 47th IEEE conf. on decision and control, Cancun, 9–11 December 2008, pp 1404–1409

Download references

Author information

Authors and Affiliations

Colorado State University, Fort Collins, CO, USA
Edwin K. P. Chong
Integrity Applications Incorporated, Ann Arbor, MI, USA
Christopher M. Kreucher
University of Michigan, Ann Arbor, MI, USA
Alfred O. Hero III

Authors

Edwin K. P. Chong
View author publications
You can also search for this author in PubMed Google Scholar
Christopher M. Kreucher
View author publications
You can also search for this author in PubMed Google Scholar
Alfred O. Hero III
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin K. P. Chong.

Additional information

This material is based in part upon work supported by the Air Force Office of Scientific Research under Award FA9550-06-1-0324 and by DARPA under Award FA8750-05-2-0285. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the Air Force or of DARPA. Approved for Public Release, Distribution Unlimited.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chong, E.K.P., Kreucher, C.M. & Hero, A.O. Partially Observable Markov Decision Process Approximations for Adaptive Sensing. Discrete Event Dyn Syst 19, 377–422 (2009). https://doi.org/10.1007/s10626-009-0071-x

Download citation

Received: 18 July 2008
Accepted: 14 May 2009
Published: 28 May 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10626-009-0071-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Abstract

Access this article

Similar content being viewed by others

Tutorial on PCA and approximate PCA and approximate kernel PCA

Residuals-based distributionally robust optimization with covariate information

Dynamic Event-triggered Control and Estimation: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Abstract

Access this article

Similar content being viewed by others

Tutorial on PCA and approximate PCA and approximate kernel PCA

Residuals-based distributionally robust optimization with covariate information

Dynamic Event-triggered Control and Estimation: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation