research-article

An experimental comparison of explore/exploit strategies for the learning classifier system XCS

Authors:
Tim Hansmeier

Paderborn University, Germany

Paderborn University, Germany
View Profile

,
Marco Platzner

Paderborn University, Germany

Paderborn University, Germany
View Profile

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference CompanionJuly 2021Pages 1639–1647https://doi.org/10.1145/3449726.3463159

Published:08 July 2021Publication History

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1639–1647

ABSTRACT

When determining the actions to execute, reinforcement learners are constantly faced with the decision of either exploiting existing knowledge or exploring new options, risking short-term costs but potentially improving performance in the long run. This paper describes and experimentally evaluates four existing explore/exploit strategies for the learning classifier system XCS. The evaluation takes place on three well-known learning problems - two multiplexers and one maze environment. An automized parameter optimization is conducted, showing that different environments require different parametrization of the strategies. Further, our results indicate that none of the strategies is superior to the others. It turns out that multi-step problems with scarce rewards are challenging for the selected strategies, highlighting the need to develop more reliable explore/exploit strategies to tackle such environments.

References

Anthony J. Bagnall and George D. Smith. 2005. A multiagent model of the UK market in electricity generation. IEEE Transactions on Evolutionary Computation 9, 5 (oct 2005), 522--536. Google ScholarDigital Library
A. M. Barry. 2002. The stability of long action chains in XCS. Soft Computing - A Fusion of Foundations, Methodologies and Applications 6, 3 (jun 2002), 183--199. Google ScholarCross Ref
Martin V. Butz. 2001. Biasing Exploration in an Anticipatory Learning Classifier System. In Advances in Learning Classifier Systems, 4th International Workshop, IWLCS 2001, San Francisco, CA, USA, July 7-8, 2001, Revised Papers (Lecture Notes in Computer Science), Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson (Eds.), Vol. 2321. Springer, 3--22. Google ScholarCross Ref
Ali Hamzeh and Adel Rahmani. 2005. A Fuzzy System to Control Exploration Rate in XCS. In Learning Classifier Systems, International Workshops, IWLCS 2003-2005, Revised Selected Papers (Lecture Notes in Computer Science), Tim Kovacs, Xavier Llorà, Keiki Takadama, Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson (Eds.), Vol. 4399. Springer, 115--127. Google ScholarCross Ref
Tim Kovacs. 2002. Performance and population state metrics for rule-based learning systems. In Proceedings of the 2002 Congress on Evolutionary Computation, CEC 2002, Vol. 2. IEEE Computer Society, 1781--1786. Google ScholarCross Ref
Peter R. Lewis, Marco Platzner, Bernhard Rinner, Jim Tørresen, and Xin Yao (Eds.). 2016. Self-aware Computing Systems. Springer International Publishing. Google ScholarCross Ref
Manuel López-Ibáñez, Jérémie Dubois-Lacoste, Leslie Pérez Cáceres, Thomas Stützle, and Mauro Birattari. 2016. The irace package: Iterated Racing for Automatic Algorithm Configuration. Operations Research Perspectives 3 (2016), 43--58. Google ScholarCross Ref
H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (mar 1947), 50--60. Google ScholarCross Ref
Alex McMahon, Dan Scott, Paul Baxter, and Will Browne. 2006. An autonomous explore/exploit strategy. In Proceedings of AISB'06: Adaptation in Artificial and Biological Systems, Vol. 2. ACM Press, New York, New York, USA, 192--201. Google ScholarDigital Library
Christian Müller-Schloer, Hartmut Schmeck, and Theo Ungerer (Eds.). 2011. Organic Computing --- A Paradigm Shift for Complex Systems. Springer Basel. Google ScholarCross Ref
Lilia Rejeb, Zahia Guessoum, and Rym M'Hallah. 2005. An Adaptive Approach for the Exploration-Exploitation Dilemma and Its Application to Economic Systems. In Learning and Adaption in Multi-Agent Systems, First International Workshop, LAMAS 2005, Utrecht, The Netherlands, July 25, 2005, Revised Selected Papers (Lecture Notes in Computer Science), Karl Tuyls, Pieter Jan't Hoen, Katja Verbeeck, and Sandip Sen (Eds.), Vol. 3898. Springer, 165--176. Google ScholarDigital Library
Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introdcution (2nd ed.). MIT Press. 427 pages.Google Scholar
Stewart W. Wilson. 1995. Classifier Fitness Based on Accuracy. Evolutionary Computation 3, 2 (jun 1995), 149--175. Google ScholarDigital Library
Stewart W. Wilson. 1996. Explore/Exploit Strategies in Autonomy. In From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior. The MIT Press. Google ScholarCross Ref
Robert F Zhang and Ryan J Urbanowicz. 2020. A Scikit-learn Compatible Learning Classifier System. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA. Google ScholarDigital Library

Index Terms

An experimental comparison of explore/exploit strategies for the learning classifier system XCS
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents
  2. Machine learning
    1. Machine learning approaches
      1. Rule learning

Recommendations

Learning classifier system with average reward reinforcement learning

In the family of Learning Classifier Systems, the classifier system XCS is most widely used and investigated. However, the standard XCS has difficulties solving large multi-step problems, where long action chains are needed to get delayed rewards. Up to ...
Read More
To explore or to exploit: An entropy-driven approach for evolutionary algorithms

An evolutionary algorithm is an optimization process comprising two important aspects: exploration discovers potential offspring in new search regions; and exploitation utilizes promising solutions already identified. Intelligent balance between these ...
Read More
Analysis and improvement of fitness exploitation in XCS: bounding models, tournament selection, and bilateral accuracy

The evolutionary learning mechanism in XCS strongly depends on its accuracy-based fitness approach. The approach is meant to result in an evolutionary drive from classifiers of low accuracy to those of high accuracy. Since, given inaccuracy, lower ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2021
2047 pages
ISBN:9781450383516
DOI:10.1145/3449726
Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 July 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
XCS
autonomous computing systems
evolutionary machine learning
exploitation
exploration
learning classifier systems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 105
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An experimental comparison of explore/exploit strategies for the learning classifier system XCS

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning classifier system with average reward reinforcement learning

To explore or to exploit: An entropy-driven approach for evolutionary algorithms

Analysis and improvement of fitness exploitation in XCS: bounding models, tournament selection, and bilateral accuracy