Skip to main content
Log in

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

  • Partially Observable Markov Decision Processes
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. V.A. Andriyanov, I.A. Kogan and G.A. Umnov, Optimal control of a partially observable discrete Markov process, Aut. Remot. C. 4 (1980) 555–561.

    Google Scholar 

  2. S.C. Albright, Structural results for partially observable Markov decision processes, Oper. Res. 27 (1979) 1041–1053.

    Google Scholar 

  3. A. Arapostathis and S.I. Marcus, Analysis of an identification algorithm arising in the adaptive estimation of Markov chains, Math. Control, Signals, Systems 3 (1990) 1–29.

    Google Scholar 

  4. K.J. Åström, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.

    Google Scholar 

  5. K.J. Åström, Optimal control of Markov processes with incomplete state information, II. The convexity of the loss function, J. Math. Anal. Appl. 26 (1969) 403–406.

    Google Scholar 

  6. R.G. Bartle,The Elements of Real Analysis, 2nd ed. (Wiley, New York, 1976).

    Google Scholar 

  7. D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice-Hall, Englewood Cliffs, 1987).

    Google Scholar 

  8. R. Bellman, A Markovian decision problem, J. Math. Mech. 6 (1957) 679–684.

    Google Scholar 

  9. R. Bellman,Adaptive Control Processes: A Guided Tour (Princeton University Press, Princeton, 1961).

    Google Scholar 

  10. V.S. Borkar, Control of Markov chains with long-run average cost criterion: the dynamic programming equations, SIAM J. Control Optim. 27 (1989) 642–657.

    Google Scholar 

  11. D.P. Bertsekas and S.E. Shreve,Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York, 1978).

    Google Scholar 

  12. R. Cavazos-Cadena, Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains, Syst. Control Lett. 10 (1988) 71–78.

    Google Scholar 

  13. R. Cavazos-Cadena, Necessary conditions for the optimality equation in average-reward Markov decision processes, Appl. Math. Optim. 19 (1989) 97–112.

    Google Scholar 

  14. J.L. Doob,Stochastic Processes (Wiley, New York, 1953).

    Google Scholar 

  15. D.J. White, Real applications of Markov decision processes, Interfaces 15 (1985) 73–83.

    Google Scholar 

  16. D.J. White, Further real applications of Markov decision processes, Interfaces 18 (1988) 55–61.

    Google Scholar 

  17. D.J. White, A selective survey of hypothetical applications of Markov decision processes, Technical Report, Dept. of Systems Engineering, University of Virginia, Charlottesville, Virginia (1987).

    Google Scholar 

  18. E.B. Dynkin and A.A. Yushkevich,Controlled Markov Processes (Springer, New York, 1979).

    Google Scholar 

  19. E. Fernández-Gaucherand, A. Arapostathis and S.I. Marcus, On the adaptive control of a partially observable Markov decision process,Proc. 27th IEEE Conf. on Decision and Control, Austin, Texas (1988) pp. 1204–1210.

  20. E. Fernández-Gaucherand, A. Arapostathis and S.I. Marcus, On the adaptive control of a partially observable binary Markov decision process, in:Advances in Computing and Control, W.A. Porter et al. (eds.), Lecture Notes in Control and Information Sciences, vol. 130 (Springer, Berlin, 1989) pp. 217–228.

    Google Scholar 

  21. E. Fernández-Gaucherand, A. Arapostathis and S.I. Marcus, On partially observable Markov decision processes with an average cost criterion,Proc. 28th IEEE Conf. on Decision and Control, Tampa, Florida (1989) pp. 1267–1272.

  22. E. Fernández-Gaucherand, A. Arapostathis and S.I. Marcus, Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes, to appear in Syst. Control Lett.

  23. E. Fernández-Gaucherand, Estimation and control of partially observable Markov decision processes, Ph.D. dissertation, Electrical and Computer Engineering Dept., The University of Texas at Austin (1991).

  24. C.H. Fine, A quality control model with learning effects, Oper. Res. 36 (1988) 437–444.

    Google Scholar 

  25. J.-P. Georgin, Contrôle des chaines de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré 14, Sect. B (1978) 255–277.

    Google Scholar 

  26. J.-P. Georgin, Estimation et contrôle des chaines de Markov sur des espaces arbitraires,Lecture Notes in Mathematics, vol. 363 (Springer, Berlin, 1978) pp. 71–113.

    Google Scholar 

  27. M.K. Ghosh and S.I. Marcus, Ergodic control of Markov chains, to appear inProc. 29th IEEE Conf. on Decision and Control, Honolulu, Hawaii (1990).

  28. L.G. Gubenko and E.S. Statland, On controlled, discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975) 47–61.

    Google Scholar 

  29. O. Hernández-Lerma and J.B. Lasserre, Average cost optimal policies for Markov control processes with Borel state space and unbounded costs, LAAS-Report 90067, LAAS-CNRS, Toulouse, France (1990), to appear in Syst. Control Lett.

    Google Scholar 

  30. O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).

    Google Scholar 

  31. O. Hernändez-Lerma, Harris-recurrent Markov control processes, preprint (1989).

  32. O. Hernández-Lerma and S.I. Marcus, Adaptive control of Markov processes with incomplete state information and unknown parameters, J. Optim. Theory Appl. 52 (1987) 227–241.

    Google Scholar 

  33. O. Hernández-Lerma, R. Montes-de-Oca and R. Cavazos-Cadena, Recurrence conditions for Markov decision processes with Borel state space: a survey, this volume.

  34. D.P. Heyman and M.J. Sobel,Stochastic Models in Operations Research, vol. II: Stochastic Optimization (McGraw-Hill, New York, 1984).

    Google Scholar 

  35. W.J. Hopp and S.C. Wu, Multiaction maintenance under Markovian deterioration and incomplete information, Naval Res. Logist. Quart. 35 (1988) 447–462.

    Google Scholar 

  36. D. Kreps and E. Porteus, On the optimality of structured policies in countable stage decision processes, II: Positive and negative problems, SIAM J. Appl. Math. 32 (1977) 457–466.

    Google Scholar 

  37. M. Kurano, Markov decision processes with a Borel measurable cost function: the average case, Math. Oper. Res. 11 (1986) 309–320.

    Google Scholar 

  38. M. Kurano, The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin, SIAM J. Control Optim. 27 (1989) 296–307.

    Google Scholar 

  39. M. Kurano, Average cost Markov decision processes under the hypothesis of Doeblin, Report no. 9, Dept. of Mathematics, Faculty of Education, Chiba University, Japan (1990).

    Google Scholar 

  40. P.R. Kumar and P. Varaiya,Stochastic Systems: Estimation, Identification and Adaptive Control (Prentice-Hall, Englewood Cliffs, 1986).

    Google Scholar 

  41. W.S. Lovejoy, On the convexity of policy regions in partially observed systems, Oper. Res. 35 (1987) 619–621.

    Google Scholar 

  42. W.S. Lovejoy, Some monotonicity results for partially observed Markov decision processes, Oper. Res. 35 (1987) 736–743.

    Google Scholar 

  43. S.I. Marcus, E. Fernández-Gaucherand and A. Arapostathis, Analysis of an adaptive control scheme for a partially observed Markov decision process,Proc. 24th Annual Conf. on Information Sciences and Systems, Princeton University (1990) pp. 253–258.

  44. G.E. Monahan, A survey of partially observable Markov decision processes: Theory, models, and algorithms, Manag. Sci. 28 (1982) 1–16.

    Google Scholar 

  45. M. Ohnishi, H. Kawai and H. Mine, An optimal inspection and replacement policy under incomplete state information, Europ. J. Oper. Res. 27 (1986) 117–128.

    Google Scholar 

  46. M. Ohnishi, H. Mine and H. Kawai, An optimal inspection and replacement policy under incomplete state information: Average cost criterion, in:Stochastic Models in Reliability Theory, S. Osaki and Y. Hatoyama (eds.) Lecture Notes Econ. Math. Syst., vol. 235 (Springer, Berlin, 1984) pp. 187–197.

    Google Scholar 

  47. L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM J. Control Optim. 18 (1980) 362–380.

    Google Scholar 

  48. E.L. Porteus, On the optimality of structured policies in countable stage decision processes, Manag. Sci. 22 (1975) 148–157.

    Google Scholar 

  49. E.L. Porteus, Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs, J. Optim. Theory Appl. 36 (1982) 419–432.

    Google Scholar 

  50. S.M. Ross, Arbitrary state Markovian decision processes, Ann. Math. Stat. 39 (1968) 2118–2122.

    Google Scholar 

  51. S.M. Ross, Quality control under Markovian deterioration, Manag. Sci. 17 (1971) 587–596.

    Google Scholar 

  52. S.M. Ross,Introduction to Stochastic Dynamic Programming (Academic Press, New York, 1983).

    Google Scholar 

  53. H.L. Royden,Real Analysis, 2nd. ed. (Macmillan, New York, 1968).

    Google Scholar 

  54. L.I. Sennott, Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs, Oper. Res. 37 (1989) 626–633.

    Google Scholar 

  55. E.J. Sondik, The optimal control of partially observable Markov processes, Ph.D. dissertation, Electrical Engineering Dept., Stanford University (1971).

  56. E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted costs, Oper. Res. 26 (1978) 282–304.

    Google Scholar 

  57. R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov process over a finite horizon, Oper. Res. 21 (1973) 1071–1088.

    Google Scholar 

  58. S. Stidham, Scheduling, routing, and flow control in stochastic networks, in:Stochastic Differential Systems, Stochastic Control Theory and Applications, W. Fleming and P.L. Lions (eds.), IMA Volumes in Mathematics and Its Applications, vol. 10 (Springer, Berlin, 1988) pp. 529–561.

    Google Scholar 

  59. Y. Sawaragi and T. Yoshikawa, Discrete-time Markovian decision processes with incomplete state observations, Ann. Math. Stat. 41 (1970) 78–86.

    Google Scholar 

  60. H.M. Taylor, Markovian sequential replacement processes, Ann. Math. Statist. 38 (1965) 1677–1694.

    Google Scholar 

  61. L.C. Thomas, Connectedness conditions for denumerable state Markov decision processes, in:Recent Developments in Markov Decision Processes, R. Hartley, L.C. Thomas and D.J. White (eds.) (Academic Press, London, 1980).

    Google Scholar 

  62. K.M. VanHee,Bayesian Control of Markov Chains, Math. Centre Tracts, vol. 95 (Mathematisch Centrum, Amsterdam, 1978).

    Google Scholar 

  63. C.C. White, A Markov quality control process subject to partial observation, Manag. Sci. 23 (1977) 843–852.

    Google Scholar 

  64. C.C. White, Optimal inspection and repair of a production process subject to deterioration, J. Oper. Res. Soc. 29 (1978) 235–243.

    Google Scholar 

  65. C.C. White, Bounds on optimal cost for a replacement problem with partial observation, Naval Res. Logist. Quart. 26 (1979) 415–422.

    Google Scholar 

  66. C.C. White, Optimal control-limit strategies for a partially observed replacement problem, Int. J. Systems Sci. 10 (1979) 321–331.

    Google Scholar 

  67. C.C. White, Monotone control laws for noisy, countable-state Markov chains, Europ. J. Oper. Res. 5 (1980) 124–132.

    Google Scholar 

  68. R. Wang, Computing optimal quality control policies — two actions, J. Appl. Prob. 13 (1976) 826–832.

    Google Scholar 

  69. R. Wang, Optimal replacement policy with unobservable states, J. Appl. Prob. 14 (1977) 340–348.

    Google Scholar 

  70. J. Wijngaard, Stationary Markovian decision problems and perturbation theory of quasi-compact linear operators, Math. Oper. Res. 2 (1977) 91–102.

    Google Scholar 

  71. C.C. White and D.J. White, Markov decision processes, Europ. J. Oper. Res. 39 (1989) 1–16.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández-Gaucherand, E., Arapostathis, A. & Marcus, S.I. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes. Ann Oper Res 29, 439–469 (1991). https://doi.org/10.1007/BF02283610

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02283610

Keywords

Navigation