GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning

Franco, María A.; Krasnogor, Natalio; Bacardit, Jaume

doi:10.1007/s00500-013-1016-8

GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning

Focus
Published: 03 March 2013

Volume 17, pages 953–981, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

María A. Franco¹,
Natalio Krasnogor¹ &
Jaume Bacardit^1,2

596 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

This paper reports an exhaustive analysis performed over two specific Genetics-based Machine Learning systems: BioHEL and GAssist. These two systems share many mechanisms and operators, but at the same time, they apply two different learning paradigms (the Iterative Rule Learning approach and the Pittsburgh approach, respectively). The aim of this paper is to: (a) propose standard configurations for handling small and large datasets, (b) compare the two systems in terms of learning capabilities, complexity of the obtained solutions and learning time, (c) determine the areas of the problem space where each one of these two systems performs better, and (d) compare them with other well-known machine learning algorithms. The results show that it is possible to find standard configurations for both systems. With these configurations the systems perform up to the standards of other state-of-the-art machine learning algorithms such as Support Vector Machines. Moreover, we identify the problem domains where each one of these systems have advantages and disadvantages and propose ways to improve the systems based on this analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A survey of transfer learning

Article Open access 28 May 2016

Notes

This function sums 105 instead of 100 to handle border cases.
Contiguous bits that have the same value, either true or false in the ADI representation.
Memory requirements are not reported in this paper since the memory is mostly dominated by the size of the training sets instead of the solutions generated.
No statistical tests were performed in this analysis but the conclusions are qualitative.
In the case of GAssist the results with some configurations are missing, since the runs for these configurations took more than 10 days each and this is one of the constraints of our computational framework.
The accuracy of the scenario divided by the largest accuracy obtained.
For these algorithms we only performed a global analysis to determine the best parameter settings overall the problems at the same time, similar to the analysis in Sections 5.1.1 and 5.1.2.
Even when the size of the rule sets is not small, clustering techniques can be applied to interpret the solutions as shown by Bassel et al. (2011).
These experiments took longer than the maximum amount of time allowed by our computational framework.

References

Aguilar-Ruiz J, Riquelme J, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B 33(2):324–331
Article Google Scholar
Aha DW, Kibler D, Albert MK (1991) Instance based learning algorithms. Mach Learn 6:37–66
Google Scholar
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318
Article Google Scholar
Bacardit J (2004) Pittsburgh Genetics-Based machine learning in the data mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain
Bacardit J, Butz M (2007) Data mining in learning classifier systems: Comparing XCS with GAssist. In: Kovacs T, Llorà à X, Takadama K, Lanzi P, Stolzmann W, Wilson S (eds) Learning classifier systems. Lecture Notes in computer science, vol 4399. Springer, Berlin, pp 282–290
Bacardit J, Garrell JM (2003a) Bloat control and generalization pressure using the minimum description length principle for a pittsburgh approach learning classifier system. In: Proceedings of the 6th international workshop on learning classifier systems
Bacardit J, Garrell JM (2003b) Evolving multiple discretizations with adaptive intervals for a pittsburgh Rule-Based learning classifier system. In: Proceedings of the genetic and evolutionary computation conference-GECCO2003, LNCS 2724, Springer, Berlin, pp 1818–1831
Bacardit J, Krasnogor N (2008) Empirical evaluation of ensemble techniques for a pittsburgh learning classifier system. In: Learning classifier systems. Lecture notes on computer science, vol 4998. Springer, Berlin, pp 255–268
Bacardit J, Krasnogor N (2009a) A mixed discrete-continuous attribute list representation for large scale classification domains. In: GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM Press, New York, pp 1155–1162
Bacardit J, Krasnogor N (2009b) Performance and efficiency of memetic pittsburgh learning classifier systems. Evolut Comput J 17(3)
Bacardit J, Goldberg DE, Butz MV, Llorá X, Garrell JM (2004) Speeding-up Pittsburgh learning classifier systems: modeling time and accuracy. In: Parallel Problem Solving from Nature-PPSN VIII. Lecture notes in computer science, vol 3242, chap 103. Springer, Berlin, pp 1021–1031
Bacardit J, Bernadó-Mansilla E, Butz MV (2007a) Learning classifier systems: Looking back and glimpsing ahead. In: Bacardit J, Bernadó-Mansilla E, Butz MV, Kovacs T, Llorà à X, Takadama K (eds) IWLCS, Lecture Notes in Computer Science, vol 4998, Springer, Berlin, pp 1–21
Bacardit J, Goldberg DE, Butz MV (2007b) Improving the performance of a Pittsburgh learning classifier system using a default rule. In: Learning Classifier Systems, Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003–2005, LNCS 4399, Springer, Berlin, pp 291–307
Bacardit J, Burke EK, Krasnogor N (2009a) Improving the scalability of rule-based evolutionary learning. Memetic Comput 1(1):55–67
Article Google Scholar
Bacardit J, Stout M, Hirst JD, Valencia A, Smith R, Krasnogor N (2009b) Automated alphabet reduction for protein datasets. BMC Bioinform 10(1):6
Article Google Scholar
Bassel GW, Glaab E, Marquez J, Holdsworth MJ, Bacardit J (2011) Functional network construction in arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell Online 23(9):3101–3116
Article Google Scholar
Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238
Article Google Scholar
Bernadó-Mansilla E, Llorà à X, Garrell JM (2006) XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Lanzi P, Stolzmann W, Wilson S (eds) Advances in learning classifier systems. Lecture notes in computer science, chap 8, vol 2321. Springer, Berlin, pp 115–132
Blake C, Keogh E, Merz C (1998) UCI repository of machine learning databases. url:(http://www.ics.uci.edu/mlearn/MLRepository.html)
Browne WN, Ioannides C (2007) Investigating scaling of an abstracted LCS utilising ternary and s-expression alphabets. In: Proceedings of the 2007 GECCO conference companion on genetic and evolutionary computation. ACM Press, London, pp 2759–2764
Bull L (2001) Simple markov models of the genetic algorithm in classifier systems: Multi-step tasks. In: IWLCS ’00: revised papers from the third international workshop on advances in learning classifier systems. Springer, London, pp 29–36
Bull L, Hurst J (2000) Self-Adaptive mutation in ZCS controllers. In: Lecture notes in computer science, chapter 33, vol 1803. Springer, Berlin, pp 342–349
Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evolut Comput 11(4):496–502
Article Google Scholar
Butz MV (2005) Kernel-based, ellipsoidal conditions in the real-valued XCS classifier system. In: Proceedings genetic evolutionary computation conference GECCO 2005. ACM, New York, pp 1835–1842
Butz MV, Herbort O (2008) Context-dependent predictions and cognitive arm control with XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM Press, New York, pp 1357–1364
Butz MV, Goldberg DE, Lanzi PL (2005) Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems. IEEE Trans Evolut Comput 9(5):452–473
Article Google Scholar
Butz MV, Lanzi PL, Llorà à X, Loiacono D (2008a) An analysis of matching in learning classifier systems. In: GECCO ’08: Proceedings of the 10th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1349–1356
Butz MV, Lanzi PL, Wilson SW (2008b) Function approximation with XCS: hyperellipsoidal conditions, recursive least squares, and compaction. IEEE Trans Evolut Comput 12(3):355–376
Article Google Scholar
Butz MV, Stalph PO, Lanzi PL (2008c) Self-adaptive mutation in XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation. ACM Press, Atlanta, pp 1365–1372
De Jong K (1988) Learning with genetic algorithms: an overview. Mach Learn 3(2-3):121–138
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Franco M, Martínez I, Gorrin C (2010a) Supply chain management sales using XCSR. In: Bacardit J, Browne W, Drugowitsch J, Bernadó-Mansilla E, Butz M (eds) Learning classifier systems. Lecture notes in computer science, vol 6471, Springer, Berlin, pp 145–165
Franco MA, Krasnogor N, Bacardit J (2010b) Analysing BioHEL using challenging boolean functions. In: GECCO ’10: Proceedings of the 12th annual conference comp on genetic and evolutionary computation. ACM Press, New York, pp 1855–1862
Franco MA, Krasnogor N, Bacardit J (2010c) Speeding up the evaluation of evolutionary learning systems using GPGPUs. In: GECCO ’10: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1039–1046
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the fifteenth international conference on machine learning, ICML ’98. Morgan Kaufmann Publishers Inc., San Francisco, pp 144–151
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, New York
Freitas AA (2008) A review of evolutionary algorithms for data mining. In: Maimon O, Rokach L (eds) Soft computing for knowledge discovery and data mining. Springer US, pp 79–111
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput Fusion Found Methodol Appl 13(10):959–977
Google Scholar
Goldberg DE (1989) Genetic algorithms for search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc., Boston
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Holland J (1975) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor
Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Hayes-Roth D, Waterman F (eds) Pattern-directed inference systems. Academic Press, New York, pp 313–329
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Statist 6(2):65–70, ArticleType: primary_article / Full publication date: 1979 / Copyright 1979 Board of the Foundation of the Scandinavian Journal of Statistics
Google Scholar
Hruschka E, Campello R, Freitas A, de Carvalho A (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern C 39(2):133–155
Article Google Scholar
Hurst J, Bull L (2001) Self-Adaptation in classifier system controllers. Artif Life Robotics 5:109–119
Article Google Scholar
Hurst J, Bull L (2002) A self-adaptive XCS. In: Lanzi P, Stolzmann W, Wilson S (eds) Advances in learning classifier systems. Lecture notes in computer science, vol 2321, Springer, Berlin, pp 333–360. doi:i0.1007/3-540-48104-4_5
Hurst J, Bull L (2006) A neural learning classifier system with self-adaptive constructivism for mobile robot control. Artif Life 12:353–380
Article Google Scholar
Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13(2-3):189–228
Article Google Scholar
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput 9(1):3–12
Article Google Scholar
John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, Burlington, pp 338–345
Jong KD, Spears WM (1991) Learning concept classification rules using genetic algorithms. In: Proceedings of the 12th international joint conference on artificial intelligence, vol 2, Morgan Kaufmann Publishers Inc., Sydney, pp 651–656
Lanzi PL (2008) Learning classifier systems: then and now. Evolut Intell 1(1):63–82
Article Google Scholar
Lanzi PL, Perrucci A (1999a) Extending the representation of classifier conditions part I: from binary to messy coding. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference, vol 1. Morgan Kaufmann, Orlando, pp 345–352
Lanzi PL, Perrucci A (1999b) Extending the representation of classifier conditions part II: from messy coding to S-Expressions. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference, vol 1. Morgan Kaufmann, Orlando, pp 345–352
Lanzi PL, Wilson SW (2006) Using convex hulls to represent classifier conditions. In: GECCO ’06: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1481–1488
Llorà X, Garrell JM (2000) Evolving agent aggregates using cellular genetic algorithms. In: Whitley LD, Goldberg DE, Cantú-Paz E, Spector L, Parmee IC, Beyer HG (eds) GECCO. Morgan Kaufmann, Burlington, p 868
Llorà X, Sastry K (2006) Fast rule matching for learning classifier systems via vector instructions. In: GECCO ’06: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1513–1520
Llorà X, Reddy R, Matesic B, Bhargava R (2007a) Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, GECCO ’07. ACM Press, New York, pp 2098–2105
Llorà X, Sastry K, Yu T, Goldberg DE (2007b) Do not match, inherit: fitness surrogates for genetics-based machine learning techniques. In: GECCO ’07: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 1798–1805
Mellor D (2005) A first order logic classifier system. In: GECCO ’05: Proceedings of the 2005 conference on genetic and evolutionary computation. ACM Press, New York, pp 1819–1826
Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University, USA
Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2008a) A comparative study of several genetic-based classifiers in supervised learning. In: Learning classifier systems in data mining. Studies in computational intelligence, chap 10, vol 125. Springer, Berlin, pp 205–230
Orriols-Puig A, Sastry K, Goldberg D, Bernadó-Mansilla E (2008b) Substructural surrogates for learning decomposable classification problems. In: Bacardit J, Bernadó-Mansilla E, Butz M, Kovacs T, Llorà à X, Takadama K (eds) Learning classifier systems. Lecture notes in computer science, vol 4998, Springer, Berlin, pp 235–254. doi:10.1007/978-3-540-88138-4_14
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization, MIT Press, Cambridge, pp 185–208
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco
Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471
Article MATH Google Scholar
Sarafis IA (2005) Data mining clustering of high dimensional databases with evolutionary algorithms. PhD thesis, Deptartment of Computer Science, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, Scotland, UK
Sastry K (2005) Principled efficiency enhancement techniques. In: Genetic and Evolutionary Computation Conference-GECCO 2005-Tutorial. Available at url:http://www.illigal.uiuc.edu/web/kumara/2005/11/24/principled-efficiency-enhancement-techniques/
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, University of Pittsburgh, url:http://portal.acm.org/citation.cfm?id=909835
Smith SF (1983) Flexible learning of problem solving heuristics through adaptive search. In: Proceedings of the eighth international joint conference on artificial intelligence, vol 1. Morgan Kaufmann Publishers Inc., Karlsruhe, pp 422–425
Smith R, Jiang M, Bacardit J, Stout M, Krasnogor N, Hirst J (2010) A learning classifier system with mutual-information-based fitness. Evolut Intell 3(1):31–50
Article Google Scholar
Stout M, Bacardit J, Hirst JD, Krasnogor N (2008) Prediction of recursive convex hull class assignments for protein residues. Bioinformatics 24(7):916–923
Article Google Scholar
Stout M, Bacardit J, Hirst JD, Smith RE, Krasnogor N (2009) Prediction of topological contacts in proteins using learning classifier systems. Soft Comput 13:245–258
Article Google Scholar
Tabacman M, Bacardit J, Loiseau I, Krasnogor N (2008) Learning classifier systems in optimisation problems: a case study on fractal travelling salesman problems. In: Proceedings of the international workshop on learning classifier systems, Springer, Lecture Notes in Computer Science, vol (to appear)
Urbanowicz R, Moore J (2010) The application of pittsburgh-style learning classifier systems to address genetic heterogeneity and epistasis in association studies. In: Schaefer R, Cotta C, Kolodziej J, Rudolph G (eds) Parallel problem solving from nature-PPSN XI. Lecture notes in computer science, chap 41, vol 6238. Springer, Berlin, pp 404–413
Urbanowicz RJ, Moore JH (2009) Learning classifier systems: a complete introduction, review, and roadmap. J Artif Evol Appl 2009:1–25
Article Google Scholar
Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil PB (ed) Machine Learning: ECML-93—Proceedings of the European conference on machine learning. Springer, Berlin, pp 280–296
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1(6):80–83
Article Google Scholar
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Article Google Scholar
Wilson SW (2000) Get real! XCS with continuous-valued inputs. In: Learning classifier systems. From foundations to applications. LNAI-1813, Springer, Berlin, pp 209–219
Wilson SW (2001) Mining oblique data with XCS. In: Luca Lanzi P, Stolzmann W, Wilson S (eds) Advances in learning classifier systems. Lecture notes in computer science, vol 1996, Springer, Berlin, pp 283–290
Wilson SW (2002) Classifiers that approximate functions. Natural Comput 1(2–3):211–234
Article MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the UK Engineering and Physical Sciences Research Council (EPSRC) for its support under grant EP/H016597/1. They would also like to acknowledge the High Performance Computing facility at the University of Nottingham for providing the necessary framework for these experiments.

Author information

Authors and Affiliations

ICOS Research Group, School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, UK
María A. Franco, Natalio Krasnogor & Jaume Bacardit
Multi-disciplinary Centre for Integrative Biology (MyCIB), School of Biosciences, University of Nottingham, Sutton Bonington, LE12 5RD, UK
Jaume Bacardit

Authors

María A. Franco
View author publications
You can also search for this author in PubMed Google Scholar
Natalio Krasnogor
View author publications
You can also search for this author in PubMed Google Scholar
Jaume Bacardit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to María A. Franco.

Additional information

Communicated by A-A Tantar.

Electronic supplementary material

Below is the link to the electronic supplementary material.

PDF (121 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franco, M.A., Krasnogor, N. & Bacardit, J. GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning. Soft Comput 17, 953–981 (2013). https://doi.org/10.1007/s00500-013-1016-8

Download citation

Published: 03 March 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s00500-013-1016-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

PDF (121 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

PDF (121 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation