Skip to main content

Towards Efficient Training on Large Datasets for Genetic Programming

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3060))

Abstract

Genetic programming (GP) has the potential to provide unique solutions to a wide range of supervised learning problems. The technique, however, does suffer from a widely acknowledged computational overhead. As a consequence applications of GP are often confined to datasets consisting of hundreds of training exemplars as opposed to tens of thousands of exemplars, thus limiting the widespread applicability of the approach. In this work we propose and thoroughly investigate a data sub-sampling algorithm – hierarchical dynamic subset selection – that filters the initial training dataset in parallel with the learning process. The motivation being to focus the GP training on the most difficult or least recently visited exemplars. To do so, we build on the dynamic sub-set selection algorithm of Gathercole and extend it into a hierarchy of subset selections, thus matching the concept of a memory hierarchy supported in modern computers. Such an approach provides for the training of GP solutions to data sets with hundreds of thousands of exemplars in tens of minutes whilst matching the classification accuracies of more classical approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett III, F.H., et al.: Building a Parallel Computer System for $18,000 that Performs a Half Petra-Flop per Day. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1484–1490. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  2. Juillé, H., Pollack, J.B.: Massively Parallel Genetic Programming. In: Angeline, P.J., Kinnear, K.E. (eds.) Advances in Genetic Programming 2, ch. 17, pp. 339–358. MIT Press, Cambridge (1996)

    Google Scholar 

  3. Koza, J.R., et al.: evolving Computer Programs using Reconfigurable Gate Arrays and Genetic Programming. In: Proceedings of the ACM 6th International Symposium on Field Programmable Gate Arrays, pp. 209–219. ACM Press, New York (1998)

    Chapter  Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  5. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of computer and Systems Sciences 55, 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  6. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann, San Francisco (2002)

    MATH  Google Scholar 

  7. Gathercole, C., Ross, P.: Dynamic Training Subset Selection for Supervised Learning in Genetic Programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994)

    Google Scholar 

  8. Song, D., Heywood, M.I., Zincir-Heywood, A.N.: A Linear Genetic Programming Approach to Intrusion Detection. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 2325–2336. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Cramer, N.L.: A Representation for the Adaptive Generation of Simple Sequential Programs. In: Proceedings of the International Conference on Genetic Algorithms and Their Application, pp. 183–187 (1985)

    Google Scholar 

  10. Nordin, P.: A Compiling Genetic Programming System that Directly Manipulates the Machine Code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, ch. 14, pp. 311–334. MIT Press, Cambridge (1994)

    Google Scholar 

  11. Huelsbergen, L.: Finding General Solutions to the Parity Problem by Evolving Machine- Language Representations. In: Proceedings of the 3rd Conference on Genetic Programming, pp. 158–166. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  12. Heywood, M.I., Zincir-Heywood, A.N.: Dynamic Page-Based Linear Genetic Programming. IEEE Transactions on Systems, Man and Cybernetics – PartB: Cybernetics 32(3), 380–388 (2002)

    Article  Google Scholar 

  13. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  14. Elkan C.: Results of the KDD 1999 Classifier Learning Contest. SIGKDD Explorations. ACM SIGKDD 1(2), 63–64 (2000)

    Google Scholar 

  15. UCI Machine Learning Repository (2003), http://www.ics.uci.edu/~mlearn/MLRepository.html

  16. Lichodzijewski, P., Zincir-Heywood, A.N., Heywood, M.I.: Host-Based Intrusion Detection Using Self-Organizing Maps. In: IEEE-INNS International Joint Conference on Neural Networks, pp. 1714–1719 (2002)

    Google Scholar 

  17. Brameier, M., Banzhaf, W.: A Comparison of Linear genetic Programming and Neural Networks in Medical data Mining. IEEE Transaction on Evolutionary Computation 5(1), 17–26 (2001)

    Article  Google Scholar 

  18. Chittur, A.: Model Generation for Intrusion Detection System using Genetic Algorithms, 17 pages (2001), http://www1.cs.columbia.edu/ids/publications/

  19. Punch B., Goodman E.: lilgp Genetic Programming System, v 1.1 (1998), http://garage.cps.msu.edu/software/lil-gp/lilgp-index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Curry, R., Heywood, M. (2004). Towards Efficient Training on Large Datasets for Genetic Programming. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24840-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22004-6

  • Online ISBN: 978-3-540-24840-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics