ABSTRACT
In this work we present an exhaustive empirical analysis of the Pittsburgh-style BioHEL system using a broad set of variants of the well-known k-DNF boolean function. These functions present a broad set of possible challenges for most machine learning techniques such as varying degrees of rule specificity, class unbalance and niche overlap. Moreover, as the ideal solutions are known, one can easily assess if a learning system is able to find them, and how fast. Specifically, we study two aspects of BioHEL: its sensitivity to the coverage breakpoint parameter (that determines the degree of generality pressure applied by the fitness function) and the default rule policy. The results show that BioHEL is highly sensitive to the choice of coverage breakpoint (as was expected) and that using a suitable (known beforehand) default class allows the system to learn faster than using a majority class policy. Moreover, the experiments indicate that BioHEL scalability depends directly on both k (the specificity of the rules) and the number of DNF terms in the problem.
- Jaume Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.Google Scholar
- Jaume Bacardit, Edmund Burke, and Natalio Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, 1(1):55--67, March 2009.Google ScholarCross Ref
- Jaume Bacardit, David E. Goldberg, and Martin V. Butz. Improving the performance of a pittsburgh learning classifier system using a default rule. In Learning Classifier Systems, Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, pages 291--307. Springer-Verlag, LNCS 4399, 2007. Google ScholarDigital Library
- Jaume Bacardit and Natalio Krasnogor. A mixed discrete-continuous attribute list representation for large scale classification domains. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1155--1162, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Alfonso Valencia, Robert Smith, and Natalio Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10(1):6, 2009.Google ScholarCross Ref
- Martin V. Butz and Martin Pelikan. Studying XCS/BOA learning in boolean functions: structure encoding and random boolean functions. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1449--456, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Andrzej Ehrenfeucht, David Haussler, Michael J. Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. In Proceedings of the first annual workshop on Computational learning theory, pages 139--154, MIT, Cambridge, Massachusetts, United States, 1988. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Arturo Hernandez-Aguirre, Bill P. Buckles, and Carlos A. Coello Coello. On learning kDNFs ns boolean formulas. In Evolvable Hardware, NASA/DoD Conference on, volume 0, page 0240, Los Alamitos, CA, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- Daniel S. Hirschberg, Michael J. Pazzani, and Kamal M. Ali. Average case analysis of k-CNF and k-DNF learning algorithms. In Proceedings of the workshop on Computational learning theory and natural learning systems (vol. 2) : intersections between theory and experiment, pages 15--28, Cambridge, MA, USA, 1994. MIT Press. Google ScholarDigital Library
- Michael J. Kearns. The Computational Complexity of Machine Learning. MIT Press, Cambridge, Massachusetts, 1990. Google ScholarDigital Library
- Albert Orriols-Puig and Ester Bernado-Mansilla. Evolutionary rule-based systems for imbalanced data sets. Soft Comput., 13(3):213--225, 2008. Google ScholarDigital Library
- Albert Orriols-Puig, Ester Bernado-Mansilla, David E. Goldberg, Kumara Sastry, and Pier Luca Lanzi. Facetwise analysis of XCS for problems with class imbalances. Trans. Evol. Comp, 13(5):1093--1119, 2009. Google ScholarDigital Library
- Jorma Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.Google ScholarDigital Library
- Michael Stout, Jaume Bacardit, Jonathan D. Hirst, and Natalio Krasnogor. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916--923, April 2008. Google ScholarDigital Library
- Gilles Venturini. SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning, pages 280--296. Springer-Verlag, 1993. Google ScholarDigital Library
- Stewart W. Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149--175, June 1995. Google ScholarDigital Library
- Ian H. Witten and Eibe Frank. Data mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library
Index Terms
- Analysing bioHEL using challenging boolean functions
Recommendations
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study
The classification problem can be addressed by numerous techniques and algorithms which belong to different paradigms of machine learning. In this paper, we are interested in evolutionary algorithms, the so-called genetics-based machine learning ...
A mixed discrete-continuous attribute list representation for large scale classification domains
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computationDatasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, ...
Smart crossover operator with multiple parents for a Pittsburgh learning classifier system
GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computationThis paper proposes a new smart crossover operator for a Pittsburgh Learning Classifier System. This operator, unlike other recent LCS approaches of smart recombination, does not learn the structure of the domain, but it merges the rules of N parents (N ...
Comments