Abstract
Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approaches work best on problems with sparse transition matrices where they can often efficiently achieve ε-convergence without visiting all states; however, on problems with dense transition matrices where most states are reachable in one step, the sRTDP approach shows an advantage over traditional RTDP by up to three orders of magnitude, as we demonstrate in this paper. We also specify a new variant of sRTDP based on BRTDP, named sBRTDP, which converges quickly when compared to RTDP variants, since it does less updating by making a better choice of the next state to be visited.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Chichester (1994)
Bertsekas, D.P.: Distributed dynamic programming. IEEE Transactions on Automatic Control 27, 610–617 (1982)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Technical Report UM-CS-1993-002, U. Mass. Amherst (1993)
Bonet, B., Geffner, H.: Labeled RTDP: Improving the convergence of real-time dynamic programming. In: ICAPS-2003, Trento, Italy, pp. 12–21 (2003)
McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds. In: ICML 2005, Bonn, Germany, pp. 569–576 (2005)
Sanner, S., Goetschalckx, R., Driessens, K., Shani, G.: Bayesian real-time dynamic programming. In: 21st IJCAI, San Francisco, CA, USA, pp. 1784–1789 (2009)
Feng, Z., Hansen, E.A., Zilberstein, S.: Symbolic generalization for on-line planning. In: 19th UAI, pp. 209–216 (2003)
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic planning using decision diagrams. In: UAI 1999, Stockholm, pp. 279–288 (1999)
Bahar, R.I., Frohm, E., Gaona, C., Hachtel, G., Macii, E., Pardo, A., Somenzi, F.: Algebraic Decision Diagrams and their applications. In: IEEE /ACM International Conference on CAD, pp. 428–432 (1993)
Boutilier, C., Friedman, N., Goldszmidt, M., Koller, D.: Context-specific independence in Bayesian networks. In: UAI 1996, Portland, OR, pp. 115–123 (1996)
Guestrin, C., Koller, D., Parr, R., Venktaraman, S.: Efficient solution methods for factored MDPs. JAIR 19, 399–468 (2002)
Delgado, K.V., Sanner, S., de Barros, L.N., Cozman, F.G.: Efficient Solutions to Factored MDPs with Imprecise Transition Probabilities. In: 19th ICAPS, Thessaloniki, Greece (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Delgado, K.V., Fang, C., Sanner, S., de Barros, L.N. (2010). Symbolic Bounded Real-Time Dynamic Programming. In: da Rocha Costa, A.C., Vicari, R.M., Tonidandel, F. (eds) Advances in Artificial Intelligence – SBIA 2010. SBIA 2010. Lecture Notes in Computer Science(), vol 6404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16138-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-16138-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16137-7
Online ISBN: 978-3-642-16138-4
eBook Packages: Computer ScienceComputer Science (R0)