Definition
Average-reward reinforcement learning (ARL) refers to learning policies that optimize the average reward per time step by continually taking actions and observing the outcomes including the next state and the immediate reward.
Motivation and Background
Reinforcement learning (RL) is the study of programs that improve their performance at some task by receiving rewards and punishments from the environment (Sutton & Barto, 1998). RL has been quite successful in automatic learning of good procedures for complex tasks such as playing Backgammon and scheduling elevators (Tesauro, 1992; Crites & Barto, 1998). In episodic domains in which there is a natural termination condition such as the end of the game in Backgammon, the obvious performance measure to optimize is the expected total reward per episode. But some domains such as elevator scheduling are recurrent,...
Recommended Reading
Abounadi, J., Bertsekas, D. P., & Borkar, V. (2002). Stochastic approximation for non-expansive maps: Application to Q-learning algorithms. SIAM Journal of Control and Optimization, 41(1), 1–22.
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81–138.
Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont, MA: Athena Scientific.
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2, 213–231.
Crites, R. H., & Barto, A. G. (1998). Elevator group control using multiple reinforcement agents. Machine Learning, 33(2/3), 235–262.
Ghavamzadeh, M., & Mahadevan, S. (2006). Hierarchical average reward reinforcement learning. Journal of Machine Learning Research, 13(2), 197–229.
Kearns, M., & Singh S. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2/3), 209–232.
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1/2/3), 159–195.
Marbach, P., Mihatsch, O., & Tsitsiklis, J. N. (2000). Call admission control and routing in integrated service networks using neuro-dynamic programming. IEEE Journal on Selected Areas in Communications, 18(2), 197–208.
Proper, S., & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for product delivery. In European conference on machine learning (pp. 725–742). Springer.
Puterman, M. L. (1994). Markov decision processes: Discrete dynamic stochastic programming. New York: Wiley.
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the tenth international conference on machine learning (pp. 298–305). San Mateo, CA: Morgan Kaufmann.
Seri, S., & Tadepalli, P. (2002). Model-based hierarchical average-reward reinforcement learning. In Proceedings of international machine learning conference (pp. 562–569). Sydney, Australia: Morgan Kaufmann.
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Tadepalli, P., & Ok, D. (1998). Model-based average-reward reinforcement learning. Artificial Intelligence, 100, 177–224.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8(3–4), 257–277.
Tsitsiklis, J., & Van Roy, B. (1999). Average cost temporal-difference learning. Automatica, 35(11), 1799–1808.
Van Roy, B., & Tsitsiklis, J. (2002). On average versus discounted temporal-difference learning. Machine Learning, 49(2/3), 179–191.
Wang, G., & Mahadevan, S. (1999). Hierarchical optimization of policy-coupled semi-Markov decision processes. In Proceedings of the 16th international conference on machine learning (pp. 464–473). Bled, Slovenia.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Tadepalli, P. (2011). Average-Reward Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_49
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_49
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering