Abstract
Genetic Programming (GP) has shown promise in feature construction where high-level features are formed by combining original features using predefined functions or operators. Multiple feature construction methods have been proposed for high-dimensional data with thousands of features. Results of these methods show that several constructed features can maintain or even improve the discriminating ability of the original feature set. However, some particular features may have better ability than other features to distinguish instances of one class from other classes. Therefore, it may be more difficult to construct a better discriminating feature when combing features that are relevant to different classes. In this study, we propose a new GP-based feature construction method called CDFC that constructs multiple features, each of which focuses on distinguishing one class from other classes. We propose a new representation for class-dependent feature construction and a new fitness function to better evaluate the constructed feature set. Results on eight datasets with varying difficulties showed that the features constructed by CDFC can improve the discriminating ability of thousands of original features in most cases. Results also showed that CFDC is more effective and efficient than the hybrid MGPFC method which was shown to have better performance than standard GP to feature construction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
These datasets are publicly available at http://www.gems-system.org, and http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.
References
Ahmed, S., Zhang, M., Peng, L.: A new GP-based wrapper feature construction approach to classification and biomarker identification. In: IEEE Congress on Evolutionary Computation, pp. 2756–2763 (2014)
Al-Sahaf, H., Al-Sahaf, A., Xue, B., Johnston, M., Zhang, M.: Automatically evolving rotation-invariant texture image descriptors by genetic programming. IEEE Trans. Evol. Comput. 21(1), 83–101 (2016)
Bhanu, B., Krawiec, K.: Coevolutionary construction of features for transformation of representation in machine learning. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 249–254. Press (2002)
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1, 300 (2007)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Espejo, P., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Krawiec, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evol. Mach. 3, 329–343 (2002)
Nag, K., Pal, N.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2016)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)
Tran, B., Zhang, M., Xue, B.: Multiple feature construction in classification on high-dimensional data using GP. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Comput. 8(1), 3–15 (2015)
Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Trans. Neural Netw. 19(7), 1267–1278 (2008)
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)
Zhang, J., Wang, S., Chen, L., Gallinari, P.: Multiple Bayesian discriminant functions for high-dimensional massive data classification. Data Mining Knowl. Discovery 31(2), 1–37 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tran, B., Xue, B., Zhang, M. (2017). Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-63004-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63003-8
Online ISBN: 978-3-319-63004-5
eBook Packages: Computer ScienceComputer Science (R0)