Abstract
The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is classified using the C4.5 decision tree learning algorithm. The Genetic Programming is used to construct new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. The Genetic Algorithm is used to determine which such features are the most predictive. Using ten well-known datasets we show that our approach, in comparison to C4.5 alone, provides marked improvement in a number of cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahluwalia, M. & Bull, L. (1999) Co-Evolving Functions in Genetic Programming: Classification using k-nearest neighbour. In W. Banzhaf, J. Daida, G. Eiben, M-H. Garzon, J. Honavar, K. Jakeila, R. Smith (eds) GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, pp. 947–952.
Dixon, P. W., Corne, D. W., & Oates, M. J. (2001) A Preliminary Investigation of Modified XCS as a Generic Data Mining Tool. In P-L. Lanzi, W. Stolzmann, S. Wilson (eds) Advances in Learning Classifier Systems. Springer, pp.133–151.
Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. Univ. Michigan.
Kelly, J. D. & Davis, L. (1991) Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm. In R. Belew & L. Booker (eds) Proceedings of the Fourth International Conference on Genetic Algorithms. Morgan Kaufmann, pp377–383.
Koza, J. R. (1992) Genetic Programming. MIT Press.
Mangasarian, O. L. & Musicant, D. R. (2001) Lagrangian support vector machines. Journal of Machine Learning Research 1:161–177.
Quinlan, J. R. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann.
Raymer, M. L., Punch, W., Goodman, E. D. & Kuhn, L. (1996) Genetic Programming for Improved Data Mining-Application to the Biochemistry of Protein Interactions. In J. R. Koza, K. Deb, M. Dorigo, D. B. Fogel, M. Garzon, H. Iba & R. Riolo (eds) Proceedings of the Second Annual Conference on Genetic Programming, Morgan Kaufmann, pp375–380.
Siedlecki, W. & Sklansky, J. (1988) On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence 2:197–220.
Vafaie, H. amp;& De Jong, K. 1995. Genetic Algorithms as a Tool for Restructuring Feature Space Representations. In Proceedings of the International Conference on Tools with A.I. IEEE Computer Society Press.
Witten, I. H. & Frank, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smith, M.G., Bull, L. (2003). Feature Construction and Selection Using Genetic Programming and a Genetic Algorithm. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds) Genetic Programming. EuroGP 2003. Lecture Notes in Computer Science, vol 2610. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36599-0_21
Download citation
DOI: https://doi.org/10.1007/3-540-36599-0_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00971-9
Online ISBN: 978-3-540-36599-0
eBook Packages: Springer Book Archive