Towards Efficient Training on Large Datasets for Genetic Programming

Curry, Robert; Heywood, Malcolm

doi:10.1007/978-3-540-24840-8_12

Robert Curry¹⁸ &
Malcolm Heywood¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3060))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1539 Accesses
15 Citations

Abstract

Genetic programming (GP) has the potential to provide unique solutions to a wide range of supervised learning problems. The technique, however, does suffer from a widely acknowledged computational overhead. As a consequence applications of GP are often confined to datasets consisting of hundreds of training exemplars as opposed to tens of thousands of exemplars, thus limiting the widespread applicability of the approach. In this work we propose and thoroughly investigate a data sub-sampling algorithm – hierarchical dynamic subset selection – that filters the initial training dataset in parallel with the learning process. The motivation being to focus the GP training on the most difficult or least recently visited exemplars. To do so, we build on the dynamic sub-set selection algorithm of Gathercole and extend it into a hierarchy of subset selections, thus matching the concept of a memory hierarchy supported in modern computers. Such an approach provides for the training of GP solutions to data sets with hundreds of thousands of exemplars in tens of minutes whilst matching the classification accuracies of more classical approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett III, F.H., et al.: Building a Parallel Computer System for $18,000 that Performs a Half Petra-Flop per Day. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1484–1490. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Juillé, H., Pollack, J.B.: Massively Parallel Genetic Programming. In: Angeline, P.J., Kinnear, K.E. (eds.) Advances in Genetic Programming 2, ch. 17, pp. 339–358. MIT Press, Cambridge (1996)
Google Scholar
Koza, J.R., et al.: evolving Computer Programs using Reconfigurable Gate Arrays and Genetic Programming. In: Proceedings of the ACM 6th International Symposium on Field Programmable Gate Arrays, pp. 209–219. ACM Press, New York (1998)
Chapter Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of computer and Systems Sciences 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann, San Francisco (2002)
MATH Google Scholar
Gathercole, C., Ross, P.: Dynamic Training Subset Selection for Supervised Learning in Genetic Programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994)
Google Scholar
Song, D., Heywood, M.I., Zincir-Heywood, A.N.: A Linear Genetic Programming Approach to Intrusion Detection. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 2325–2336. Springer, Heidelberg (2003)
Chapter Google Scholar
Cramer, N.L.: A Representation for the Adaptive Generation of Simple Sequential Programs. In: Proceedings of the International Conference on Genetic Algorithms and Their Application, pp. 183–187 (1985)
Google Scholar
Nordin, P.: A Compiling Genetic Programming System that Directly Manipulates the Machine Code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, ch. 14, pp. 311–334. MIT Press, Cambridge (1994)
Google Scholar
Huelsbergen, L.: Finding General Solutions to the Parity Problem by Evolving Machine- Language Representations. In: Proceedings of the 3rd Conference on Genetic Programming, pp. 158–166. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Heywood, M.I., Zincir-Heywood, A.N.: Dynamic Page-Based Linear Genetic Programming. IEEE Transactions on Systems, Man and Cybernetics – PartB: Cybernetics 32(3), 380–388 (2002)
Article Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
MATH Google Scholar
Elkan C.: Results of the KDD 1999 Classifier Learning Contest. SIGKDD Explorations. ACM SIGKDD 1(2), 63–64 (2000)
Google Scholar
UCI Machine Learning Repository (2003), http://www.ics.uci.edu/~mlearn/MLRepository.html
Lichodzijewski, P., Zincir-Heywood, A.N., Heywood, M.I.: Host-Based Intrusion Detection Using Self-Organizing Maps. In: IEEE-INNS International Joint Conference on Neural Networks, pp. 1714–1719 (2002)
Google Scholar
Brameier, M., Banzhaf, W.: A Comparison of Linear genetic Programming and Neural Networks in Medical data Mining. IEEE Transaction on Evolutionary Computation 5(1), 17–26 (2001)
Article Google Scholar
Chittur, A.: Model Generation for Intrusion Detection System using Genetic Algorithms, 17 pages (2001), http://www1.cs.columbia.edu/ids/publications/
Punch B., Goodman E.: lilgp Genetic Programming System, v 1.1 (1998), http://garage.cps.msu.edu/software/lil-gp/lilgp-index.html

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia, Canada, B3H 1W5
Robert Curry & Malcolm Heywood

Authors

Robert Curry
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Windsor, 401 Sunset Avenue, N9B 3P4, Windsor, Ontario, Canada
Ahmed Y. Tawfik
School of Computer Science, University of Windsor, Windsor, Ontario,
Scott D. Goodwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Curry, R., Heywood, M. (2004). Towards Efficient Training on Large Datasets for Genetic Programming. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-24840-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22004-6
Online ISBN: 978-3-540-24840-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics