Skip to main content
Log in

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

Adaptive sensing involves actively managing sensor resources to achieve a sensing task, such as object detection, classification, and tracking, and represents a promising direction for new applications of discrete event system methods. We describe an approach to adaptive sensing based on approximately solving a partially observable Markov decision process (POMDP) formulation of the problem. Such approximations are necessary because of the very large state space involved in practical adaptive sensing problems, precluding exact computation of optimal solutions. We review the theory of POMDPs and show how the theory applies to adaptive sensing problems. We then describe a variety of approximation methods, with examples to illustrate their application in adaptive sensing. The examples also demonstrate the gains that are possible from nonmyopic methods relative to myopic methods, and highlight some insights into the dependence of such gains on the sensing resources and environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. For the case where \(\mathcal{S}\) represents target kinematic states in Cartesian coordinates, we typically use the Euclidean norm for this metric.

  2. In fact, given a POMDP, the Q-value can be viewed as the objective function value for a related problem; see Bertsekas and Tsitsiklis (1996).

References

  • Altman E (1998) Constrained Markov decision processes. Chapman and Hall/CRC, London

    Google Scholar 

  • Bartels R, Backus S, Zeek E, Misoguti L, Vdovin G, Christov IP, Murnane MM, Kapteyn HC (2000) Shaped-pulse optimization of coherent soft X-rays. Nature 406:164–166

    Article  Google Scholar 

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    Google Scholar 

  • Bertsekas DP (2005) Dynamic programming and suboptimal control: a survey from ADP to MPC. In: Proc. joint 44th IEEE conf. on decision and control and European control conf., Seville, 12–15 December 2005

  • Bertsekas DP (2007) Dynamic programming and optimal control, vol I, 3rd edn, 2005; vol II, 3rd edn. Athena Scientific, Belmont

    Google Scholar 

  • Bertsekas DP, Castanon DA (1999) Rollout algorithms for stochastic scheduling problems. Journal of Heuristics 5:89–108

    Article  MATH  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Blatt D, Hero AO III (2006a) From weighted classification to policy search. In: Advances in neural information processing systems (NIPS) vol 18, pp 139–146

  • Blatt D, Hero AO III (2006b) Optimal sensor scheduling via classification reduction of policy search (CROPS). In: Proc. int. conf. on automated planning and scheduling (ICAPS)

  • Castanon D (1997) Approximate dynamic programming for sensor management. In: Proc. 36th IEEE conf. on decision and control, San Diego, pp 1202–1207

  • Chang HS, Givan RL, Chong EKP (2004) Parallel rollout for online solution of partially observable Markov decision processes. Discret Event Dyn Syst 14(3):309–341

    Article  MATH  MathSciNet  Google Scholar 

  • Chang HS, Fu MC, Hu J, Marcus SI (2007) Simulation-based algorithms for Markov decision processes. Springer series in communications and control engineering. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Chen RC, Wagner K (2007) Constrained partially observed Markov decision processes for adaptive waveform scheduling. In: Proc. int. conf. on electromagnetics in advanced applications, Torino, 17–21 September 2007, pp 454–463

  • Cheng HT (1988) Algorithms for partially observable Markov decision processes. PhD dissertation, University of British Columbia

  • Chhetri A, Morrell D, Papandreou-Suppappola A (2004) Efficient search strategies for non-myopic sensor scheduling in target tracking. In: Asilomar conf. on signals, systems, and computers

  • Chong EKP, Givan RL, Chang HS (2000) A framework for simulation-based network control via hindsight optimization. In: Proc. 39th IEEE conf. on decision and control, Sydney, 12–15 December 2000, pp 1433–1438

  • de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51(6):850–865

    Article  MathSciNet  Google Scholar 

  • de Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math Oper Res 29(3):462–478

    Article  MATH  MathSciNet  Google Scholar 

  • Gottlieb E, Harrigan R (2001) The Umbra simulation framework. Sandia Tech Report SAND2001-1533 (Unlimited Release)

  • He Y, Chong EKP (2004) Sensor scheduling for target tracking in sensor networks. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004, pp 743–748

  • He Y, Chong EKP (2006) Sensor scheduling for target tracking: a Monte Carlo sampling approach. Digit Signal Process 16(5):533–545

    Article  Google Scholar 

  • Hero A, Castanon D, Cochran D, Kastella K (eds) (2008) Foundations and applications of sensor management. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Ji S, Parr R, Carin L (2007) Nonmyopic multiaspect sensing with partially observable Markov decision processes. IEEE Trans Signal Process 55(6):2720–2730 (Part 1)

    Article  Google Scholar 

  • Julier S, Uhlmann J (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422

    Article  Google Scholar 

  • Krakow LW, Li Y, Chong EKP, Groom KN, Harrington J, Rigdon B (2006) Control of perimeter surveillance wireless sensor networks via partially observable Markov decision process. In: Proc. 2006 IEEE int Carnahan conf on security technology (ICCST), Lexington, 17–20 October 2006

  • Kearns MJ, Mansour Y, Ng AY (1999) A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. 16th int. joint conf. on artificial intelligence, pp 1324–1331

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  • Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134

    Article  MATH  MathSciNet  Google Scholar 

  • Kreucher CM, Hero A, Kastella K (2005a) A comparison of task driven and information driven sensor management for target tracking. In: Proc. 44th IEEE conf. on decision and control (CDC’05), 12–15 December 2005

  • Kreucher CM, Kastella K, Hero AO III (2005b) Sensor management using an active sensing approach. Signal Process 85(3):607–624

    Article  MATH  Google Scholar 

  • Kreucher CM, Kastella K, Hero AO III (2005c) Multitarget tracking using the joint multitarget probability density. IEEE Trans Aerosp Electron Syst 41(4):1396–1414

    Article  Google Scholar 

  • Kreucher CM, Blatt D, Hero AO III, Kastella K (2006) Adaptive multi-modality sensor scheduling for detection and tracking of smart targets. Digit Signal Process 16:546–567

    Article  Google Scholar 

  • Kreucher CM, Hero AO III, Kastella K, Chang D (2004) Efficient methods of non-myopic sensor management for multitarget tracking. In: Proc. 43rd IEEE conf. on decision and control (CDC’04), 14–17 December 2004

  • Krishnamurthy V (2005) Emission management for low probability intercept sensors in network centric warfare. IEEE Trans Aerosp Electron Syst 41(1):133–151

    Article  Google Scholar 

  • Krishnamurthy V, Evans RJ (2001) Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans Signal Process 49(12):2893–2908

    Article  Google Scholar 

  • Li Y, Krakow LW, Chong EKP, Groom KN (2006) Dynamic sensor management for multisensor multitarget tracking. In: Proc. 40th annual conf. on information sciences and systems, Princeton, 22–24 March 2006, pp 1397–1402

  • Li Y, Krakow LW, Chong EKP, Groom KN (2007) Approximate stochastic dynamic programming for sensor scheduling to track multiple targets. Digit Signal Process. doi:10.1016/j.dsp.2007.05.004

    Google Scholar 

  • Lovejoy WS (1991a) Computationally feasible bounds for partially observed Markov decision processes. Oper Res 39:162–175

    Article  MATH  MathSciNet  Google Scholar 

  • Lovejoy WS (1991b) A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28(1):47–65

    Article  MATH  MathSciNet  Google Scholar 

  • Miller SA, Harris ZA, Chong EKP (2009) A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking. EURASIP J Appl Signal Process (Special Issue on Signal Processing Advances in Robots and Autonomy). doi:10.1155/2009/724597

    Google Scholar 

  • Pontryagin LS, Boltyansky VG, Gamkrelidze RV, Mishchenko EF (1962) The mathematical theory of optimal processes. Wiley, New York

    MATH  Google Scholar 

  • Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  • Ristic B, Arulampalam S, Gordon N (2004) Beyond the Kalman filter: particle filters for tracking applications. Artech House, Norwood

    MATH  Google Scholar 

  • Roy N, Gordon G, Thrun S (2005) Finding approximate POMDP solutions through belief compression. J Artif Intell Res 23:1–40

    Article  MATH  Google Scholar 

  • Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516

    Article  MATH  MathSciNet  Google Scholar 

  • Scott WR Jr, Kim K, Larson GD, Gurbuz AC, McClellan JH (2004) Combined seismic, radar, and induction sensor for landmine detection. In: Proc. 2004 int. IEEE geoscience and remote sensing symposium, Anchorage, 20–24 September 2004, pp 1613–1616

  • Shi L, Chen C-H (2000) A new algorithm for stochastic discrete resource allocation optimization. Discret Event Dyn Syst 10:271–294

    Article  MATH  MathSciNet  Google Scholar 

  • Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21(5):1071–1088

    Article  MATH  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning. MIT, Cambridge

    Google Scholar 

  • Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT, Cambridge

    MATH  Google Scholar 

  • Tijms HC (2003) A first course in stochastic models. Wiley, New York

    Book  MATH  Google Scholar 

  • Washburn R, Schneider M, Fox J (2002) Stochastic dynamic programming based approaches to sensor resource management. In: 5th int conf on information fusion

  • Watkins CJCH (1989) Learning from delayed rewards. PhD dissertation, King’s College, University of Cambridge

  • Willems JC (1996) 1969: the birth of optimal control. In: Proc. 35th IEEE conf. on decision and control (CDC’96), pp 1586–1587

  • Wu G, Chong EKP, Givan RL (2002) Burst-level congestion control using hindsight optimization. IEEE Trans Automat Control (Special Issue on Systems and Control Methods for Communication Networks) 47(6):979–991

    MathSciNet  Google Scholar 

  • Yu H, Bertsekas DP (2004) Discretized approximations for POMDP with average cost. In: Proc. 20th conf. on uncertainty in artificial intelligence, Banff, pp 619–627

  • Zhang NL, Liu W (1996) Planning in stochastic domains: problem characteristics and approximation. Tech. report HKUST-CS96-31, Dept. of Computer Science, Hong Kong University of Science and Technology

  • Zhang Z, Moola S, Chong EKP (2008) Approximate stochastic dynamic programming for opportunistic fair scheduling in wireless networks. In: Proc. 47th IEEE conf. on decision and control, Cancun, 9–11 December 2008, pp 1404–1409

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin K. P. Chong.

Additional information

This material is based in part upon work supported by the Air Force Office of Scientific Research under Award FA9550-06-1-0324 and by DARPA under Award FA8750-05-2-0285. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the Air Force or of DARPA. Approved for Public Release, Distribution Unlimited.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chong, E.K.P., Kreucher, C.M. & Hero, A.O. Partially Observable Markov Decision Process Approximations for Adaptive Sensing. Discrete Event Dyn Syst 19, 377–422 (2009). https://doi.org/10.1007/s10626-009-0071-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-009-0071-x

Keywords

Navigation