ABSTRACT
When determining the actions to execute, reinforcement learners are constantly faced with the decision of either exploiting existing knowledge or exploring new options, risking short-term costs but potentially improving performance in the long run. This paper describes and experimentally evaluates four existing explore/exploit strategies for the learning classifier system XCS. The evaluation takes place on three well-known learning problems - two multiplexers and one maze environment. An automized parameter optimization is conducted, showing that different environments require different parametrization of the strategies. Further, our results indicate that none of the strategies is superior to the others. It turns out that multi-step problems with scarce rewards are challenging for the selected strategies, highlighting the need to develop more reliable explore/exploit strategies to tackle such environments.
- Anthony J. Bagnall and George D. Smith. 2005. A multiagent model of the UK market in electricity generation. IEEE Transactions on Evolutionary Computation 9, 5 (oct 2005), 522--536. Google ScholarDigital Library
- A. M. Barry. 2002. The stability of long action chains in XCS. Soft Computing - A Fusion of Foundations, Methodologies and Applications 6, 3 (jun 2002), 183--199. Google ScholarCross Ref
- Martin V. Butz. 2001. Biasing Exploration in an Anticipatory Learning Classifier System. In Advances in Learning Classifier Systems, 4th International Workshop, IWLCS 2001, San Francisco, CA, USA, July 7-8, 2001, Revised Papers (Lecture Notes in Computer Science), Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson (Eds.), Vol. 2321. Springer, 3--22. Google ScholarCross Ref
- Ali Hamzeh and Adel Rahmani. 2005. A Fuzzy System to Control Exploration Rate in XCS. In Learning Classifier Systems, International Workshops, IWLCS 2003-2005, Revised Selected Papers (Lecture Notes in Computer Science), Tim Kovacs, Xavier Llorà, Keiki Takadama, Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson (Eds.), Vol. 4399. Springer, 115--127. Google ScholarCross Ref
- Tim Kovacs. 2002. Performance and population state metrics for rule-based learning systems. In Proceedings of the 2002 Congress on Evolutionary Computation, CEC 2002, Vol. 2. IEEE Computer Society, 1781--1786. Google ScholarCross Ref
- Peter R. Lewis, Marco Platzner, Bernhard Rinner, Jim Tørresen, and Xin Yao (Eds.). 2016. Self-aware Computing Systems. Springer International Publishing. Google ScholarCross Ref
- Manuel López-Ibáñez, Jérémie Dubois-Lacoste, Leslie Pérez Cáceres, Thomas Stützle, and Mauro Birattari. 2016. The irace package: Iterated Racing for Automatic Algorithm Configuration. Operations Research Perspectives 3 (2016), 43--58. Google ScholarCross Ref
- H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (mar 1947), 50--60. Google ScholarCross Ref
- Alex McMahon, Dan Scott, Paul Baxter, and Will Browne. 2006. An autonomous explore/exploit strategy. In Proceedings of AISB'06: Adaptation in Artificial and Biological Systems, Vol. 2. ACM Press, New York, New York, USA, 192--201. Google ScholarDigital Library
- Christian Müller-Schloer, Hartmut Schmeck, and Theo Ungerer (Eds.). 2011. Organic Computing --- A Paradigm Shift for Complex Systems. Springer Basel. Google ScholarCross Ref
- Lilia Rejeb, Zahia Guessoum, and Rym M'Hallah. 2005. An Adaptive Approach for the Exploration-Exploitation Dilemma and Its Application to Economic Systems. In Learning and Adaption in Multi-Agent Systems, First International Workshop, LAMAS 2005, Utrecht, The Netherlands, July 25, 2005, Revised Selected Papers (Lecture Notes in Computer Science), Karl Tuyls, Pieter Jan't Hoen, Katja Verbeeck, and Sandip Sen (Eds.), Vol. 3898. Springer, 165--176. Google ScholarDigital Library
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introdcution (2nd ed.). MIT Press. 427 pages.Google Scholar
- Stewart W. Wilson. 1995. Classifier Fitness Based on Accuracy. Evolutionary Computation 3, 2 (jun 1995), 149--175. Google ScholarDigital Library
- Stewart W. Wilson. 1996. Explore/Exploit Strategies in Autonomy. In From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior. The MIT Press. Google ScholarCross Ref
- Robert F Zhang and Ryan J Urbanowicz. 2020. A Scikit-learn Compatible Learning Classifier System. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA. Google ScholarDigital Library
Index Terms
- An experimental comparison of explore/exploit strategies for the learning classifier system XCS
Recommendations
Learning classifier system with average reward reinforcement learning
In the family of Learning Classifier Systems, the classifier system XCS is most widely used and investigated. However, the standard XCS has difficulties solving large multi-step problems, where long action chains are needed to get delayed rewards. Up to ...
To explore or to exploit: An entropy-driven approach for evolutionary algorithms
An evolutionary algorithm is an optimization process comprising two important aspects: exploration discovers potential offspring in new search regions; and exploitation utilizes promising solutions already identified. Intelligent balance between these ...
Analysis and improvement of fitness exploitation in XCS: bounding models, tournament selection, and bilateral accuracy
The evolutionary learning mechanism in XCS strongly depends on its accuracy-based fitness approach. The approach is meant to result in an evolutionary drive from classifiers of low accuracy to those of high accuracy. Since, given inaccuracy, lower ...
Comments