Counter Example for Q-Bucket-Brigade Under Prediction Problem

Wada, Atsushi; Takadama, Keiki; Shimohara, Katsunori

doi:10.1007/978-3-540-71231-2_10

Atsushi Wada^1,2,
Keiki Takadama³ &
Katsunori Shimohara⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4399))

Included in the following conference series:

485 Accesses

Abstract

Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS’s reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Monte Carlo Bias Correction in Q-Learning

Reinforcement Learning

A Novel Implementation of Q-Learning for the Whittle Index

References

Holland, J.H.: Escaping brittleness: the possibilities of general-purpose. Machine Learning, an artificial intelligence approach 2, 593–623 (1986)
Google Scholar
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)
Article Google Scholar
Kovacs, T.: Evolving optimal populations with xcs classifier systems. Technical Report CSRP-96-17, University of Birmingham, School of Computer Science (1996)
Google Scholar
Butz, M.V., Pelikan, M.: Analyzing the evolutionary pressures in XCS. In: Spector, L., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 935–942. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Butz, M.V., Goldberg, D.E., Lanzi, P.L.: Bounding learning time in XCS. In: Deb, K., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), pp. 739–750. Springer, Heidelberg (2004)
Google Scholar
Butz, M.V., et al.: Toward a theory of generalization and learning in XCS. IEEE Transactions on Evolutionary Computation 8, 28–46 (2004)
Article Google Scholar
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Google Scholar
Dorigo, M., Bersini, H.: A comparison of Q-learning and classifier systems. In: Cliff, D., et al. (eds.) Proceedings of From Animals to Animats, Third International Conference on Simulation of Adaptive Behavior, pp. 248–255. MIT Press, Cambridge (1994)
Google Scholar
Lanzi, P.L.: Learning classifier systems from a reinforcement learning perspective. Soft Computing 6, 162–170 (2002)
MATH Google Scholar
Butz, M.V., Lanzi, P.L., Goldberg, D.E.: Gradient descent methods in learning classifier systems: Improving xcs performance in multistep problems. IEEE Transactions on Evolutionary Computation 9, 452–473 (2005)
Article Google Scholar
Booker, L.: Adaptive value function approximations in classifier systems. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 90–91 (2005)
Google Scholar
O’Hara, T., Bull, L.: A memetic accuracy-based neural learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2040–2045. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Wada, A., et al.: Comparison between Q-learning and ZCS Learning Classifier System: From aspect of function approximation. In: The 8th Conference on Intelligent Autonomous Systems (2004)
Google Scholar
Wada, A., et al.: Learning classifier system equivalent with reinforcement learning with function approximation. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 24–29 (2005)
Google Scholar
Wilson, S.W.: ZCS: A zeroth level classifier system. Evolutionary Computation 2, 1–18 (1994)
Article Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1038–1044. MIT Press, Cambridge (1996)
Google Scholar
Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Prieditis, A., Russell, S.J. (eds.) Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning (ICML1995), pp. 30–37. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Wada, A., et al.: Learning Classifier Systems with Convergence and Generalization. In: Foundations on Learning Classifier Systems, pp. 285–304. Springer, London (2005)
Chapter Google Scholar
Dayan, P., Sejnowski, T.J.: TD(λ) converges with probability 1. Machine Learning 14, 295–301 (1994)
Google Scholar
Jaakkola, T.S., Jorda, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. IEEE Transactions on Automatic Control 6, 1185–1201 (1994)
MATH Google Scholar
Peng, J., Williams, R.J.: On the convergence of stochastic iterative dynamic programming algorithms. Adaptive Behavior 1, 437–454 (1993)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Machine Learning 16, 185–202 (1994)
MATH Google Scholar
Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)
Article MATH Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 261–268. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)
Google Scholar
Singh, S.P., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38, 287–308 (2000)
Article MATH Google Scholar
Watkins, J.C.H.: Learning from delayed rewards. PhD thesis, Cambridge University (1989)
Google Scholar
Butz, M.V., Wilson, S.W.: An Algorithmic Description of XCS. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 253–272. Springer, Heidelberg (2002)
Google Scholar
Baird, L.C.: Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1999)
Google Scholar
Merke, A., Schoknecht, R.: Convergence of synchronous reinforcement learning with linear function approximation. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning, p. 75. ACM Press, New York (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Information and Communication Technology,
Atsushi Wada
ATR Cognitive Information Science Laboratories, 2-2-2 Hikaridai, “Keihanna Science City” Kyoto 619-0288, Japan
Atsushi Wada
Department of Human Communication, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofushi, Tokyo 182-8585, Japan
Keiki Takadama
Faculty of Engineering, Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto 610-0321, Japan
Katsunori Shimohara

Authors

Atsushi Wada
View author publications
You can also search for this author in PubMed Google Scholar
Keiki Takadama
View author publications
You can also search for this author in PubMed Google Scholar
Katsunori Shimohara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Tim Kovacs Xavier Llorà Keiki Takadama Pier Luca Lanzi Wolfgang Stolzmann Stewart W. Wilson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wada, A., Takadama, K., Shimohara, K. (2007). Counter Example for Q-Bucket-Brigade Under Prediction Problem. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds) Learning Classifier Systems. IWLCS IWLCS IWLCS 2003 2004 2005. Lecture Notes in Computer Science(), vol 4399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71231-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-71231-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71230-5
Online ISBN: 978-3-540-71231-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics