research-article

A mixed discrete-continuous attribute list representation for large scale classification domains

Authors:
Jaume Bacardit

University of Nottingham, Nottingham, United Kingdom

University of Nottingham, Nottingham, United Kingdom
View Profile

,
Natalio Krasnogor

University of Nottingham, Nottingham, United Kingdom

University of Nottingham, Nottingham, United Kingdom
View Profile

GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computationJuly 2009Pages 1155–1162https://doi.org/10.1145/1569901.1570057

Published:08 July 2009Publication History

GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

Pages 1155–1162

ABSTRACT

Datasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, accuracy, rule set size) of the BioHEL Iterative Evolutionary Rule Learning system. In this paper we, first, extend the attribute list rule representation so it can handle not only continuous domains, but also datasets with a very large number of mixed discrete-continuous attributes. Secondly, we benchmark the new representation with a diverse set of large-scale datasets and, third, we compare the new algorithms with several well-known machine learning methods. The experimental results we describe in the paper show that the new representation is equal or better than the state of-the-art in evolutionary rule representations both in terms of the accuracy obtained with the benchmark datasets used, as well as in terms of the computational time requirements needed to achieve these improved accuracies. The new attribute list representation puts BioHEL on an equal footing with other well-established machine learning techniques in terms of accuracy. In the paper, we also analyse and discuss the current weaknesses behind the current representation and indicate potential avenues for correcting them.

References

J. Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.Google Scholar
J. Bacardit, E. K. Burke, and N. Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, in press, 2009.Google Scholar
J. Bacardit and N. Krasnogor. Performance and efficiency of memetic pittsburgh learning classifier systems. Evolutionary Computation Journal, 17(3):in press, 2009. Google ScholarDigital Library
J. Bacardit, M. Stout, J. D. Hirst, K. Sastry, X. Llorà, and N. Krasnogor. Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 346--353. ACM Press, 2007. Google ScholarDigital Library
J. Bacardit, M. Stout, J. D. Hirst, A. Valencia, R. E. Smith, and N. Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10:6, 2009.Google ScholarCross Ref
G. W. Bassel, P. Fung, T.-f. F. Chow, J. A. Foong, N. J. Provart, and S. R. Cutler. Elucidating the Germination Transcriptional Program Using Small Molecules. Plant Physiol., 147(1):143--155, 2008.Google Scholar
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases, 1998. (www.ics.uci.edu/mlearn/MLRepository.html).Google Scholar
M. V. Butz. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design, volume 109 of Studies in Fuzziness and Soft Computing. Springer, 2006.Google Scholar
M. V. Butz, P. L. Lanzi, X. Llorà, and D. Loiacono. An analysis of matching in learning classifier systems. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1349--1356. ACM, 2008. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
K. A. De Jong and W. M. Spears. Learning concept classification rules using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 651--656. Morgan Kaufmann, 1991.Google Scholar
J. Demsar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006. Google ScholarDigital Library
F. Divina, M. Keijzer, and E. Marchiori. A method for handling numerical attributes in GA-based inductive concept learners. In GECCO 2003: Proceedings of the Genetic and Evolutionary Computation Conference, pages 898--908. Springer-Verlag, 12-16 July 2003. Google ScholarDigital Library
A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, 2002. Google ScholarDigital Library
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157--1182, 2003. Google ScholarDigital Library
J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive algorithms. In D. Hayes-Roth and F. Waterman, editors, Pattern-directed Inference Systems, pages 313--329. Academic Press, New York, 1978.Google ScholarCross Ref
X. Llorà, R. Reddy, B. Matesic, and R. Bhargava. Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 2098--2105. ACM Press, 2007. Google ScholarDigital Library
X. Llorà and K. Sastry. Fast rule matching for learning classifier systems via vector instructions. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1513--1520. ACM Press, 2006. Google ScholarDigital Library
A. Orriols-Puig. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Ramon Llull University, Barcelona, Spain, 2008.Google Scholar
J. Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.Google ScholarDigital Library
C. Schumacher, M. D. Vose, and L. D. Whitley. The no free lunch and problem description length. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001, pages 565--570. Morgan Kaufmann, 2001.Google Scholar
M. Stout, J. Bacardit, J. D. Hirst, and N. Krasnogor. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916--923, 2008. Google ScholarDigital Library
H. Vafaie and K. A. De Jong. Genetic algorithms as a tool for feature selection in machine learning. In Proceeding of the 4th International Conference on Tools with Artificial Intelligence, pages 200--203, 1992.Google ScholarCross Ref
G. Venturini. SIA: A supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proc. of the European Conference on Machine Learning, pages 280--296. Springer-Verlag, 1993. Google ScholarDigital Library
S. W. Wilson. Get real! XCS with continuous-valued inputs. In L. Booker, S. Forrest, M. Mitchell, and R. L. Riolo, editors, Festschrift in Honor of John H. Holland, pages 111--121. Center for the Study of Complex Systems, 1999.Google Scholar
I. H. Witten and E. Frank. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, 2000. Google ScholarDigital Library
D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Working Papers 95-02-010, Santa Fe Institute, Feb 1995. available at http://ideas.repec.org/p/wop/safiwp/95-02-010.html.Google Scholar

Index Terms

A mixed discrete-continuous attribute list representation for large scale classification domains
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning

Recommendations

Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

The classification problem can be addressed by numerous techniques and algorithms which belong to different paradigms of machine learning. In this paper, we are interested in evolutionary algorithms, the so-called genetics-based machine learning ...
Read More
Modelling the initialisation stage of the ALKR representation for discrete domains and GABIL encoding
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Models in Genetic Based Machine Learning (GBML) systems are commonly used to gain understanding of how the system works and, as a consequence, adjust it better. In this paper we propose models for the probability of having a good initial population ...
Read More
Speeding up the evaluation of evolutionary learning systems using GPGPUs
GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation

In this paper we introduce a method for computing fitness in evolutionary learning systems based on NVIDIA's massive parallel technology using the CUDA library. Both the match process of a population of classifiers against a training set and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation
July 2009
2036 pages
ISBN:9781605583259
DOI:10.1145/1569901
General Chair:
Franz Rothlauf
University of Mainz, Germany
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolutionary algorithms
large-scale datasets
learning classifier systems
rule induction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 140
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A mixed discrete-continuous attribute list representation for large scale classification domains

GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

Modelling the initialisation stage of the ALKR representation for discrete domains and GABIL encoding

Speeding up the evaluation of evolutionary learning systems using GPGPUs