Abstract
A weakness of classical Markov decision processes (MDPs) is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a promising MDP-approximation technique. To date, most ALP work has focused on the primal-LP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP), leading to a formulation that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.
Similar content being viewed by others
References
Altman, E.: Constrained Markov decision processes with total cost criteria: occupation measures and primal LP. Methods Models Oper. Res. 43(1), 45–72 (1996)
Altman, E.: Constrained Markov decision processes with total cost criteria: Lagrange approach and dual LP. Methods Models Oper. Res. 48, 387–417 (1998)
Altman, E., Shwartz, A.: Adaptive control of constrained Markov chains: criteria and policies. Ann. Oper. Res., special issue on Markov Decision Processes 28, 101–134 (1991)
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall, London, UK (1999)
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, NJ (1961)
Bertele, U., Brioschi, F.: Nonserial Dynamic Programming. Academic, New York (1972)
Bertsekas, D.P., Tsitsiklis, J. N.: Neuro-dynamic Programming. Athena Scientific, Belmont, MA (1996)
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont, MA (1997)
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), pp. 1104–1111 (1995)
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artif. Intell. 121(1,2), 49–107 (2000)
de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Oper. Res. 51(6), 850–856 (2003)
de Farias, D.P., Van Roy, B.: On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29(3), 462– 478 (2004)
Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Comput. Intell. 5(3), 142–150 (1989)
Dolgov, D.A., Durfee, E.H.: Graphical models in local, asymmetric multi-agent Markov decision processes. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), pp. 956–963 (2004a)
Dolgov, D.A., Durfee, E.H.: Optimal resource allocation and policy formulation in loosely-coupled Markov decision processes. In: Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS-04), pp. 315–324 (2004b)
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artif. Intell. Res. 19, 399–468 (2003)
Guestrin, C.: Planning Under Uncertainty in Complex Structured Environments. Ph.D. thesis, Computer Science Department, Stanford University (2003)
Kallenberg, L.: Linear Programming and Finite Markovian Control Problems. Math. Centrum, Amsterdam, Holland (1983)
Koller, D., Parr, R.: Computing factored value functions for policies in structured MDPs. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence IJCAI-99, pp. 1332–1339 (1999)
Patrascu, R., Poupart, P., Schuurmans, D., Boutilier, C., Guestrin, C.: Greedy linear value-approximation for factored Markov decision processes. In: Eighteenth National Conference on Artificial Intelligence, pp. 285–291. American Association for Artificial Intelligence, Menlo Park, CA (2002)
Poupart, P., Boutilier, C., Patrascu, R., Schuurmans, D.: Piecewise linear value function approximation for factored MDPs. In: Eighteenth national conference on Artificial Intelligence, pp. 292–299. American Association for Artificial Intelligence, Menlo Park, CA (2002)
Puterman, M. L.: Markov Decision Processes. Wiley, New York (1994)
Schuurmans, D., Patrascu, R.: Direct value-approximation for factored MDPs. In: Proceedings of the Fourteenths Neural Information Processing Systems (NIPS) (2001)
Schweitzer, P., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110, 568–582 (1985)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dolgov, D.A., Durfee, E.H. Symmetric approximate linear programming for factored MDPs with application to constrained problems. Ann Math Artif Intell 47, 273–293 (2006). https://doi.org/10.1007/s10472-006-9038-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-006-9038-x
Keywords
- Markov decision processes
- approximate linear programming
- primal-LP formulation
- dual LP
- constrained Markov problems