Abstract
Aiming at clarifying the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS’s reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter example for LCS with the Q-bucket-brigade based on the 11-state star problem, a counter example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter example to LCS verified the results predicted from the theory: (1) LCS with the Q-bucket-brigade diverged under prediction problems, where the action selection policy was fixed; and (2) such divergence was avoided by using the implicit-bucket-brigade or applying residual gradient algorithm to the Q-bucket-brigade.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Holland, J.H.: Escaping brittleness: the possibilities of general-purpose. Machine Learning, an artificial intelligence approach 2, 593–623 (1986)
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)
Kovacs, T.: Evolving optimal populations with xcs classifier systems. Technical Report CSRP-96-17, University of Birmingham, School of Computer Science (1996)
Butz, M.V., Pelikan, M.: Analyzing the evolutionary pressures in XCS. In: Spector, L., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 935–942. Morgan Kaufmann, San Francisco (2001)
Butz, M.V., Goldberg, D.E., Lanzi, P.L.: Bounding learning time in XCS. In: Deb, K., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), pp. 739–750. Springer, Heidelberg (2004)
Butz, M.V., et al.: Toward a theory of generalization and learning in XCS. IEEE Transactions on Evolutionary Computation 8, 28–46 (2004)
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Dorigo, M., Bersini, H.: A comparison of Q-learning and classifier systems. In: Cliff, D., et al. (eds.) Proceedings of From Animals to Animats, Third International Conference on Simulation of Adaptive Behavior, pp. 248–255. MIT Press, Cambridge (1994)
Lanzi, P.L.: Learning classifier systems from a reinforcement learning perspective. Soft Computing 6, 162–170 (2002)
Butz, M.V., Lanzi, P.L., Goldberg, D.E.: Gradient descent methods in learning classifier systems: Improving xcs performance in multistep problems. IEEE Transactions on Evolutionary Computation 9, 452–473 (2005)
Booker, L.: Adaptive value function approximations in classifier systems. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 90–91 (2005)
O’Hara, T., Bull, L.: A memetic accuracy-based neural learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2040–2045. IEEE Computer Society Press, Los Alamitos (2005)
Wada, A., et al.: Comparison between Q-learning and ZCS Learning Classifier System: From aspect of function approximation. In: The 8th Conference on Intelligent Autonomous Systems (2004)
Wada, A., et al.: Learning classifier system equivalent with reinforcement learning with function approximation. In: The Eighth International Workshop on Learning Classifier Systems (IWLCS2005), pp. 24–29 (2005)
Wilson, S.W.: ZCS: A zeroth level classifier system. Evolutionary Computation 2, 1–18 (1994)
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1038–1044. MIT Press, Cambridge (1996)
Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Prieditis, A., Russell, S.J. (eds.) Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning (ICML1995), pp. 30–37. Morgan Kaufmann, San Francisco (1995)
Wada, A., et al.: Learning Classifier Systems with Convergence and Generalization. In: Foundations on Learning Classifier Systems, pp. 285–304. Springer, London (2005)
Dayan, P., Sejnowski, T.J.: TD(λ) converges with probability 1. Machine Learning 14, 295–301 (1994)
Jaakkola, T.S., Jorda, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. IEEE Transactions on Automatic Control 6, 1185–1201 (1994)
Peng, J., Williams, R.J.: On the convergence of stochastic iterative dynamic programming algorithms. Adaptive Behavior 1, 437–454 (1993)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Machine Learning 16, 185–202 (1994)
Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)
Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 261–268. Morgan Kaufmann, San Francisco (1995)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 361–368. MIT Press, Cambridge (1995)
Singh, S.P., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38, 287–308 (2000)
Watkins, J.C.H.: Learning from delayed rewards. PhD thesis, Cambridge University (1989)
Butz, M.V., Wilson, S.W.: An Algorithmic Description of XCS. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 253–272. Springer, Heidelberg (2002)
Baird, L.C.: Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie Mellon University, Pittsburgh, PA (1999)
Merke, A., Schoknecht, R.: Convergence of synchronous reinforcement learning with linear function approximation. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning, p. 75. ACM Press, New York (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wada, A., Takadama, K., Shimohara, K. (2007). Counter Example for Q-Bucket-Brigade Under Prediction Problem. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds) Learning Classifier Systems. IWLCS IWLCS IWLCS 2003 2004 2005. Lecture Notes in Computer Science(), vol 4399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71231-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-71231-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71230-5
Online ISBN: 978-3-540-71231-2
eBook Packages: Computer ScienceComputer Science (R0)