skip to main content
10.1145/1569901.1570057acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A mixed discrete-continuous attribute list representation for large scale classification domains

Published:08 July 2009Publication History

ABSTRACT

Datasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, accuracy, rule set size) of the BioHEL Iterative Evolutionary Rule Learning system. In this paper we, first, extend the attribute list rule representation so it can handle not only continuous domains, but also datasets with a very large number of mixed discrete-continuous attributes. Secondly, we benchmark the new representation with a diverse set of large-scale datasets and, third, we compare the new algorithms with several well-known machine learning methods. The experimental results we describe in the paper show that the new representation is equal or better than the state of-the-art in evolutionary rule representations both in terms of the accuracy obtained with the benchmark datasets used, as well as in terms of the computational time requirements needed to achieve these improved accuracies. The new attribute list representation puts BioHEL on an equal footing with other well-established machine learning techniques in terms of accuracy. In the paper, we also analyse and discuss the current weaknesses behind the current representation and indicate potential avenues for correcting them.

References

  1. J. Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  2. J. Bacardit, E. K. Burke, and N. Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, in press, 2009.Google ScholarGoogle Scholar
  3. J. Bacardit and N. Krasnogor. Performance and efficiency of memetic pittsburgh learning classifier systems. Evolutionary Computation Journal, 17(3):in press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bacardit, M. Stout, J. D. Hirst, K. Sastry, X. Llorà, and N. Krasnogor. Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 346--353. ACM Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bacardit, M. Stout, J. D. Hirst, A. Valencia, R. E. Smith, and N. Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10:6, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. W. Bassel, P. Fung, T.-f. F. Chow, J. A. Foong, N. J. Provart, and S. R. Cutler. Elucidating the Germination Transcriptional Program Using Small Molecules. Plant Physiol., 147(1):143--155, 2008.Google ScholarGoogle Scholar
  7. C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases, 1998. (www.ics.uci.edu/mlearn/MLRepository.html).Google ScholarGoogle Scholar
  8. M. V. Butz. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design, volume 109 of Studies in Fuzziness and Soft Computing. Springer, 2006.Google ScholarGoogle Scholar
  9. M. V. Butz, P. L. Lanzi, X. Llorà, and D. Loiacono. An analysis of matching in learning classifier systems. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1349--1356. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  11. K. A. De Jong and W. M. Spears. Learning concept classification rules using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 651--656. Morgan Kaufmann, 1991.Google ScholarGoogle Scholar
  12. J. Demsar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Divina, M. Keijzer, and E. Marchiori. A method for handling numerical attributes in GA-based inductive concept learners. In GECCO 2003: Proceedings of the Genetic and Evolutionary Computation Conference, pages 898--908. Springer-Verlag, 12-16 July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157--1182, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive algorithms. In D. Hayes-Roth and F. Waterman, editors, Pattern-directed Inference Systems, pages 313--329. Academic Press, New York, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  17. X. Llorà, R. Reddy, B. Matesic, and R. Bhargava. Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 2098--2105. ACM Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Llorà and K. Sastry. Fast rule matching for learning classifier systems via vector instructions. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1513--1520. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Orriols-Puig. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Ramon Llull University, Barcelona, Spain, 2008.Google ScholarGoogle Scholar
  20. J. Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Schumacher, M. D. Vose, and L. D. Whitley. The no free lunch and problem description length. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001, pages 565--570. Morgan Kaufmann, 2001.Google ScholarGoogle Scholar
  22. M. Stout, J. Bacardit, J. D. Hirst, and N. Krasnogor. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916--923, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Vafaie and K. A. De Jong. Genetic algorithms as a tool for feature selection in machine learning. In Proceeding of the 4th International Conference on Tools with Artificial Intelligence, pages 200--203, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Venturini. SIA: A supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proc. of the European Conference on Machine Learning, pages 280--296. Springer-Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. W. Wilson. Get real! XCS with continuous-valued inputs. In L. Booker, S. Forrest, M. Mitchell, and R. L. Riolo, editors, Festschrift in Honor of John H. Holland, pages 111--121. Center for the Study of Complex Systems, 1999.Google ScholarGoogle Scholar
  26. I. H. Witten and E. Frank. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Working Papers 95-02-010, Santa Fe Institute, Feb 1995. available at http://ideas.repec.org/p/wop/safiwp/95-02-010.html.Google ScholarGoogle Scholar

Index Terms

  1. A mixed discrete-continuous attribute list representation for large scale classification domains

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation
            July 2009
            2036 pages
            ISBN:9781605583259
            DOI:10.1145/1569901

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 July 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,669of4,410submissions,38%

            Upcoming Conference

            GECCO '24
            Genetic and Evolutionary Computation Conference
            July 14 - 18, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader