Synonyms
Definition
Adaptive Real-Time Dynamic Programming (ARTDP) is an algorithm that allows an agent to improve its behavior while interacting over time with an incompletely known dynamic environment. It can also be viewed as a heuristic search algorithm for finding shortest paths in incompletely known stochastic domains. ARTDP is based on Dynamic Programming (DP), but unlike conventional DP, which consists of off-line algorithms, ARTDP is an on-line algorithm because it uses agent behavior to guide its computation. ARTDP is adaptive because it does not need a complete and accurate model of the environment but learns a model from data collected during agent-environment interaction. When a good model is available, Real-Time Dynamic Programming (RTDP) is applicable, which is ARTDP without the model-learning component.
Motivation and Background
RTDP combines strengths of heuristic search and DP. Like heuristic search – and unlike conventional DP – it does not have to evaluate the...
Recommended Reading
Barto, A., Bradtke, S., & Singh, S. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1–2), 81–138.
Bertsekas, D., & Tsitsiklis, J. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ: Prentice-Hall.
Bonet, B., & Geffner, H. (2003a). Labeled RTDP: Improving the convergence of real-time dynamic programming. In Proceedings of the 13th international conference on automated planning and scheduling (ICAPS-2003). Trento, Italy.
Bonet, B., & Geffner, H. (2003b). Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the international joint conference on artificial intelligence (IJCAI-2003). Acapulco, Mexico.
Feng, Z., Hansen, E., & Zilberstein, S. (2003). Symbolic generalization for on-line planning. In Proceedings of the 19th conference on uncertainty in artificial intelligence. Acapulco, Mexico.
Hansen. E., & Zilberstein, S. (2001). LAO*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129, 35–62.
Jalali, A., & Ferguson, M. (1989). Computationally efficient control algorithms for Markov chains. In Proceedings of the 28th conference on decision and control (pp.1283–1288), Tampa, FL.
Korf, R. (1990). Real-time heuristic search. Artificial Intelligence, 42(2–3), 189–211.
Smith, T., & Simmons, R. (2006). Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. In Proceedings of the national conference on artificial intelligence (AAAI). Boston, MA: AAAI Press.
Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning (pp.216–224). San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Barto, A.G. (2011). Adaptive Real-Time Dynamic Programming. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering