Abstract
This paper presents two methods to accelerate LC-learning, which is a novel model-based average reward reinforcement learning method to compute a bias-optimal policy in a cyclic domain. The LC-learning has successfully calculated the bias-optimal policy without any approximation approaches relying upon the notion that we only need to search the optimal cycle to find a gain-optimal policy. However it has a large complexity, since it searches most combinations of actions to detect all cycles. In this paper, we first implement two pruning methods to prevent the state explosion problem of the LC-learning. Second, we compare the improved LC-learning with one of the most rapid methods, the Prioritized Sweeping in a bus scheduling task. We show that the LC-learning calculates the bias-optimal policy more quickly than the normal Prioritized Sweeping and it also performs as well as the full-tuned version in the middle case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Leslie P. Kaelbling, Michael L. Littman, and Andrew P. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237–285, 1996.
Taro Konda and Tomohiro Yamaguchi. Lc-learning: In-stages model-based average reward reinforcement learning foundations. In Proceedings of the Seventh Pacific Rim International Conference on Artificial Intelligence (PRICAI-2002), 2002.
Sridhar Mahadevan. An average-reward reinforcement learning algorithm for computing bias-optimal policies. In Proceedings of the Thirteenth AAAI (AAAI-1996), pages 875–880, 1996.
Sridhar Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1-3):159–195, 1996.
Toshimi Minoura, S. Choi, and R. Robinson. Structural active-object systems for manufacturing control. Integrated Computer-Aided Engineering, 1(2):121–136, 1993.
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103–130, 1993.
C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of markov decision processes. Mathematics of Operations Research, 12(3):441–450, 1987.
Martin L. Puterman. Markov Decision Processes: Discrete Dynamic Stochastic Programming, 92-93. John Wiley, 1994.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Prasad Tadepalli and DoKyeong Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 100(1-2):177–223, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Konda, T., Tensyo, S., Yamaguchi, T. (2002). LC-Learning: Phased Method for Average Reward Reinforcement Learning —Preliminary Results —. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_24
Download citation
DOI: https://doi.org/10.1007/3-540-45683-X_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive