Abstract
The goal in sequential decision making under uncertainty is to find good or optimal policies for selecting actions in stochastic environments in order to achieve a long-term goal; such problems are typically modeled as Markov decision processes (MDPs). A key concept in MDPs is the value function, a real-valued function that summarizes the long-term goodness of a decision into a single number and allows the formulation of optimal decision making as an optimization problem. An exact representation of value functions in large real-world problems is infeasible; therefore, a large body of research has been devoted to value-function approximation methods, which sacrifice some representation accuracy for the sake of scalability. These approaches have delivered effective approaches to deriving good policies in hard decision problems and laid the foundation for efficient reinforcement learning algorithms, which learn good policies in unknown stochastic environments through interaction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Recommended Reading
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Bethke B, How JP (2009) Approximate dynamic programming using Bellman residual elimination and Gaussian process regression. In: Proceedings of the American control conference, St. Louis, pp 745–750
Bethke B, How JP, Ozdaglar A (2008) Approximate dynamic programming using support vector regression. In: Proceedings of the IEEE conference on decision and control, Cancun, pp 745–750
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1–3):33–57
Buşoniu L, Babuška R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using functions approximators. CRC, Boca Raton
de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51(6):850–865
Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: Proceedings of the international conference on machine learning (ICML), Washington, DC, pp 154–161
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the international conference on machine learning (ICML), Bonn, pp 201–208
Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556
Johns J, Petrik M, Mahadevan S (2009) Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76(2–3):243–256
Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149
Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J Mach Learn Res 8:2169–2231
Menache I, Mannor S, Shimkin N (2005) Basis function adaptation in temporal difference reinforcement learning. Ann Oper Res 134(1):215–238
Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. Discret Event Dyn Syst Theory Appl 13(1–2):79–110
Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: Proceedings of the international conference on machine learning (ICML), Corvallis, pp 449–456
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Rasmussen CE, Kuss M (2004) Gaussian processes in reinforcement learning. In: Thrun S, Saul LK, Scholkopf B (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge pp 751–759
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT, Cambridge
Taylor G, Parr R (2009) Kernelized value function approximation for reinforcement learning. In: Proceedings of the international conference on machine learning (ICML), Toronto, pp 1017–1024
Xu X, Hu D, Lu X (2007) Kernel-based least-squares policy iteration for reinforcement learning. IEEE Trans Neural Netw 18(4):973–992
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Lagoudakis, M.G. (2017). Value Function Approximation. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_876
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_876
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering