short-paper

Analysing bioHEL using challenging boolean functions

Authors:
Maria A. Franco

University of Nottingham, Nottingham, United Kingdom

University of Nottingham, Nottingham, United Kingdom
View Profile

,
Natalio Krasnogor

University of Nottingham, Nottingham, United Kingdom

University of Nottingham, Nottingham, United Kingdom
View Profile

,
Jaume Bacardit

University of Nottingham, Nottingham, United Kingdom

University of Nottingham, Nottingham, United Kingdom
View Profile

GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computationJuly 2010Pages 1855–1862https://doi.org/10.1145/1830761.1830817

Published:07 July 2010Publication History

GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation

Pages 1855–1862

ABSTRACT

In this work we present an exhaustive empirical analysis of the Pittsburgh-style BioHEL system using a broad set of variants of the well-known k-DNF boolean function. These functions present a broad set of possible challenges for most machine learning techniques such as varying degrees of rule specificity, class unbalance and niche overlap. Moreover, as the ideal solutions are known, one can easily assess if a learning system is able to find them, and how fast. Specifically, we study two aspects of BioHEL: its sensitivity to the coverage breakpoint parameter (that determines the degree of generality pressure applied by the fitness function) and the default rule policy. The results show that BioHEL is highly sensitive to the choice of coverage breakpoint (as was expected) and that using a suitable (known beforehand) default class allows the system to learn faster than using a majority class policy. Moreover, the experiments indicate that BioHEL scalability depends directly on both k (the specificity of the rules) and the number of DNF terms in the problem.

References

Jaume Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.Google Scholar
Jaume Bacardit, Edmund Burke, and Natalio Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, 1(1):55--67, March 2009.Google ScholarCross Ref
Jaume Bacardit, David E. Goldberg, and Martin V. Butz. Improving the performance of a pittsburgh learning classifier system using a default rule. In Learning Classifier Systems, Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, pages 291--307. Springer-Verlag, LNCS 4399, 2007. Google ScholarDigital Library
Jaume Bacardit and Natalio Krasnogor. A mixed discrete-continuous attribute list representation for large scale classification domains. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1155--1162, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Alfonso Valencia, Robert Smith, and Natalio Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10(1):6, 2009.Google ScholarCross Ref
Martin V. Butz and Martin Pelikan. Studying XCS/BOA learning in boolean functions: structure encoding and random boolean functions. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1449--456, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Andrzej Ehrenfeucht, David Haussler, Michael J. Kearns, and Leslie Valiant. A general lower bound on the number of examples needed for learning. In Proceedings of the first annual workshop on Computational learning theory, pages 139--154, MIT, Cambridge, Massachusetts, United States, 1988. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Arturo Hernandez-Aguirre, Bill P. Buckles, and Carlos A. Coello Coello. On learning kDNFs ns boolean formulas. In Evolvable Hardware, NASA/DoD Conference on, volume 0, page 0240, Los Alamitos, CA, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
Daniel S. Hirschberg, Michael J. Pazzani, and Kamal M. Ali. Average case analysis of k-CNF and k-DNF learning algorithms. In Proceedings of the workshop on Computational learning theory and natural learning systems (vol. 2) : intersections between theory and experiment, pages 15--28, Cambridge, MA, USA, 1994. MIT Press. Google ScholarDigital Library
Michael J. Kearns. The Computational Complexity of Machine Learning. MIT Press, Cambridge, Massachusetts, 1990. Google ScholarDigital Library
Albert Orriols-Puig and Ester Bernado-Mansilla. Evolutionary rule-based systems for imbalanced data sets. Soft Comput., 13(3):213--225, 2008. Google ScholarDigital Library
Albert Orriols-Puig, Ester Bernado-Mansilla, David E. Goldberg, Kumara Sastry, and Pier Luca Lanzi. Facetwise analysis of XCS for problems with class imbalances. Trans. Evol. Comp, 13(5):1093--1119, 2009. Google ScholarDigital Library
Jorma Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.Google ScholarDigital Library
Michael Stout, Jaume Bacardit, Jonathan D. Hirst, and Natalio Krasnogor. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916--923, April 2008. Google ScholarDigital Library
Gilles Venturini. SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning, pages 280--296. Springer-Verlag, 1993. Google ScholarDigital Library
Stewart W. Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149--175, June 1995. Google ScholarDigital Library
Ian H. Witten and Eibe Frank. Data mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library

Index Terms

Analysing bioHEL using challenging boolean functions
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning

Recommendations

Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

The classification problem can be addressed by numerous techniques and algorithms which belong to different paradigms of machine learning. In this paper, we are interested in evolutionary algorithms, the so-called genetics-based machine learning ...
Read More
A mixed discrete-continuous attribute list representation for large scale classification domains
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

Datasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, ...
Read More
Smart crossover operator with multiple parents for a Pittsburgh learning classifier system
GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation

This paper proposes a new smart crossover operator for a Pittsburgh Learning Classifier System. This operator, unlike other recent LCS approaches of smart recombination, does not learn the structure of the domain, but it merges the rules of N parents (N ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
July 2010
1496 pages
ISBN:9781450300735
DOI:10.1145/1830761
General Chair:
Martin Pelikan
University of Missouri, USA
,
Program Chair:
Jürgen Branke
University of Warwick, Coventry, UK
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolutionary algorithms
large-scale datasets
learning classifier systems
rule induction
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 121
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analysing bioHEL using challenging boolean functions

GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

A mixed discrete-continuous attribute list representation for large scale classification domains

Smart crossover operator with multiple parents for a Pittsburgh learning classifier system