Skip to main content
Log in

Clustering-based initialization of Learning Classifier Systems

Effects on model performance, readability and induction time

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The present paper investigates whether an “informed” initialization process can help supervised LCS algorithms evolve rulesets with better characteristics, including greater predictive accuracy, shorter training times, and/or more compact knowledge representations. Inspired by previous research suggesting that the initialization phase of evolutionary algorithms may have a considerable impact on their convergence speed and the quality of the achieved solutions, we present an initialization method for the class of supervised Learning Classifier Systems (LCS) that extracts information about the structure of studied problems through a pre-training clustering phase and exploits this information by transforming it into rules suitable for the initialization of the learning process. The effectiveness of our approach is evaluated through an extensive experimental phase, involving a variety of real-world classification tasks. Obtained results suggest that clustering-based initialization can indeed improve the predictive accuracy, as well as the interpretability of the induced knowledge representations, and paves the way for further investigations of the potential of better-than-random initialization methods for LCS algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The deletion scheme employed is adapted from the one reported in Kovacs (1999).

  2. We assume that no expert knowledge on the classification task and/or its solution is available at the time of learning.

  3. Code for both algorithms, as well as the clustering-based initialization process, is available upon request from the first author.

References

  • Aguilar-Ruiz JS, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern B 33(2):324–331

    Article  Google Scholar 

  • Aguilar-Ruiz J, Giraldez R, Riquelme J (2007) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479. doi:10.1109/TEVC.2006.883466

    Article  Google Scholar 

  • Alcalá-Fdez J, Sánchez L, García S, del Jesus M, Ventura S, Garrell J, Otero J, Romero C, Bacardit J, Rivas V, Fernández J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318. doi:10.1007/s00500-008-0323-y

    Article  Google Scholar 

  • Asuncion A, Newman DJ (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml

  • Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Catalonia, Spain

  • Bacardit J (2005) Analysis of the initialization stage of a Pittsburgh approach learning classifier system. In: Proceedings of the 2005 conference on genetic and evolutionary computation (GECCO ’05). ACM, New York, pp 1843–1850

  • Bernadó-Mansilla E, Garrell-Guiu JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238. doi:10.1162/106365603322365289

    Article  Google Scholar 

  • Bonelli P, Parodi A, Sen S, Wilson SW (1990) NEWBOOLE: a fast GBML system. In: Proceedings of the 7th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 153–159

  • Breiman L (2002) Wald Lecture II—looking inside the black box. In: 277th meeting of the Institute of Mathematical Statistics

  • Burke EK, Newall JP, Weare RF (1998) Initialization strategies and diversity in evolutionary timetabling. Evol Comput 6:81–103

    Article  Google Scholar 

  • Butz MV, Wilson SW (2001) An algorithmic description of XCS. In: IWLCS ’00: revised papers from the third international workshop on advances in learning classifier systems. Springer, London, pp 253–272

  • Butz MV, Kovacs T, Lanzi PL, Wilson SW (2004) Toward a theory of generalization and learning in XCS. IEEE Trans Evol Comput 8(1):28–46

    Article  Google Scholar 

  • Butz MV, Sastry K, Goldberg DE (2005) Strong, stable, and reliable fitness pressure in XCS due to tournament selection. Genet Program Evol Mach 6(1):53–77. doi:10.1007/s10710-005-7619-9

    Article  Google Scholar 

  • Chou CH, Chen JN (2000) Genetic algorithms: initialization schemes and genes extraction. In: The ninth IEEE international conference on fuzzy systems, 2000 (FUZZ IEEE 2000), vol 2, pp 965–968

  • De Jong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan, Ann Arbor, MI, USA

  • De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13:161–188. doi:10.1007/BF00993042

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dixon PW, Corne DW, Oates MJ (2003) A ruleset reduction algorithm for the XCS learning classifier system. In: Learning classifier systems, 5th international workshop, IWLCS 2002, Granada, Spain, September 7–8, 2002, revised papers. Lecture notes in computer science, vol 2661. Springer, Berlin-Heidelberg, pp 20–29

  • Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: ICML ’98: proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 144–151

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MATH  Google Scholar 

  • García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13:959–977. doi:10.1007/s00500-008-0392-y

    Article  Google Scholar 

  • González A, Pére R (1999) Slave: a genetic learning system based on an iterative approach. IEEE Trans Fuzzy Syst 7(2):176–191. doi:10.1109/91.755399

    Article  Google Scholar 

  • Guan SU, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B: Cybern 35(2):227–239. doi:10.1109/TSMCB.2004.842247

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Holmes JH, Sager JA (2005) Rule discovery in epidemiologic surveillance data using EpiXCS: an evolutionary computation approach. In: Miksch S, Hunter J, Keravnou ET (eds) AIME. Lecture notes in computer science, vol 3581. Springer, Heidelberg, pp 444–452

  • Ishibuchi H, Yamamoto T, Nakashima T (2005) Hybridization of fuzzy gbml approaches for pattern classification problems. IEEE Trans Syst Man Cybern B: Cybern 35(2):359–365. doi:10.1109/TSMCB.2004.842257

    Article  Google Scholar 

  • Janikow CZ (1992) Inductive learning of decision rules from attribute-based examples: a knowledge-intensive genetic algorithm approach. PhD thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

  • Kallel L, Schoenauer M (1997) Alternative random initialization in genetic algorithms. In: Proceedings of the 7th international conference on genetic algorithms. Morgan Kaufmann, pp 268–275

  • Kang RG, Jung CY (2006) The improved initialization method of genetic algorithm for solving the optimization problem. In: King I, Wang J, Chan LW, Wang D (eds) Neural information processing. Lecture notes in computer science, vol 4234. Springer, Berlin-Heidelberg, pp 789–796

  • Kovacs T (1999) Deletion schemes for classifier systems. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-99). Morgan Kaufmann, San Francisco, pp. 329–336

  • Kovacs T (2002a) XCS’s strength-based twin: part I. In: Lanzi et al (2003), pp 61–80

  • Kovacs T (2002b) XCS’s strength-based twin: part II. In: Lanzi et al (2003), pp 81–98

  • Lanzi PL (2008) Learning classifier systems: then and now. Evol Intell 1(1):63–82

    Article  Google Scholar 

  • Lanzi PL, Stolzmann W, Wilson SW (eds) (2003) Learning classifier systems. 5th international workshop, IWLCS 2002, Granada, Spain, September 7–8, 2002, revised papers. Lecture notes in computer science, vol 2661. Springer, Berlin

  • Louis SJ, McDonnell J (2004) Learning with case-injected genetic algorithms. IEEE Trans Evol Comput 8(4):316–328

    Article  Google Scholar 

  • Maaranen H, Miettinen K, Mäkelä MM (2004) Quasi-random initial population for genetic algorithms. Comput Math Appl 47(12):1885–1895

    Article  MathSciNet  MATH  Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw-Hill Higher Education

  • Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University

  • Orriols-Puig A, Bernadó-Mansilla E (2008a) Mining imbalanced data with learning classifier systems. In: Bull L, Bernadó-Mansilla E, Holmes JH (eds) Learning classifier systems in data mining. Studies in computational intelligence, vol 125. Springer, Berlin, pp 123–145. doi:10.1007/978-3-540-78979-6_6

  • Orriols-Puig A, Bernadó-Mansilla E (2008b) Revisiting UCS: description, fitness sharing, and comparison with XCS. Learning classifier systems: 10th international workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006 and 11th international workshop, IWLCS 2007, London, UK, July 8, 2007, revised selected papers, pp 96–116. doi:10.1007/978-3-540-88138-4_6

  • Orriols-Puig A, Goldberg DE, Sastry K, Bernadó-Mansilla E (2007) Modeling XCS in class imbalances: population size and parameter settings. In: GECCO ’07: proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, New York, pp 1838–1845. doi:10.1145/1276958.1277324

  • Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2008) Genetic-based machine learning systems are competitive for pattern recognition. Evol Intell 1:209–232. doi:10.1007/s12065-008-0013-9

    Article  Google Scholar 

  • Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2009) Fuzzy-UCS: a Michigan-style learning fuzzy-classifier system for supervised learning. IEEE Trans Evol Comput 13(2):260–283

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Quinlan JR (1996) Learning first-order definitions of functions. J Artif Intell Res 5:139–161

    MATH  Google Scholar 

  • Rahnamayan S, Tizhoosh HR, Salama MMA (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Appl 53(10):1605–1614

    Article  MathSciNet  MATH  Google Scholar 

  • Ramsey CL, Grefenstette JJ (1993) Case-based initialization of genetic algorithms. In: Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann, San Francisco, pp 84–91

  • Tzima F, Mitkas P (2010) Comparing strength and accuracy-based supervised learning classifier systems. Technical report, Intelligent Systems and Software Engineering Labgroup, Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece, GR-541 24

  • Tzima FA, Mitkas PA, Voukantsis D, Karatzas KD (2011) Sparse episode identification in environmental datasets: the case of air quality assessment. Expert Syst Appl 38(5):5019–5027

    Article  Google Scholar 

  • Vavliakis KN, Symeonidis AL, Mitkas PA (2010) Towards understanding how personality, motivation, and events trigger Web user activity. In: IEEE/WIC/ACM international conference on Web intelligence and intelligent agent technology

  • Venturini G (1993) Sia: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings of the European conference on machine learning. Springer, London, pp 280–296

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  • Wilson SW (1994) ZCS: A zeroth-level classifier system. Evol Comput 2(1):1–18

    Article  Google Scholar 

  • Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175

    Article  Google Scholar 

  • Wilson SW (2002) Compact rulesets from XCSI. In: IWLCS ’01: revised papers from the 4th international workshop on advances in learning classifier systems. Springer, London, pp 197–210

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

  • Zhang G, Gao L, Shi Y (2011) An effective genetic algorithm for the flexible job-shop scheduling problem. Expert Syst Appl 38(4):3563–3573

    Article  Google Scholar 

Download references

Acknowledgments

This paper is part of the 03ED735 research project, implemented within the framework of the “Reinforcement Programme of Human Research Manpower” (PENED) and cofinanced by National and Community Funds (25% from the Greek Ministry of Development-General Secretariat of Research and Technology and 75% from E.U.-European Social Funding).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fani A. Tzima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tzima, F.A., Mitkas, P.A. & Theocharis, J.B. Clustering-based initialization of Learning Classifier Systems. Soft Comput 16, 1267–1286 (2012). https://doi.org/10.1007/s00500-012-0811-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0811-y

Keywords

Navigation