Skip to main content

Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10400))

Abstract

Genetic Programming (GP) has shown promise in feature construction where high-level features are formed by combining original features using predefined functions or operators. Multiple feature construction methods have been proposed for high-dimensional data with thousands of features. Results of these methods show that several constructed features can maintain or even improve the discriminating ability of the original feature set. However, some particular features may have better ability than other features to distinguish instances of one class from other classes. Therefore, it may be more difficult to construct a better discriminating feature when combing features that are relevant to different classes. In this study, we propose a new GP-based feature construction method called CDFC that constructs multiple features, each of which focuses on distinguishing one class from other classes. We propose a new representation for class-dependent feature construction and a new fitness function to better evaluate the constructed feature set. Results on eight datasets with varying difficulties showed that the features constructed by CDFC can improve the discriminating ability of thousands of original features in most cases. Results also showed that CFDC is more effective and efficient than the hybrid MGPFC method which was shown to have better performance than standard GP to feature construction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    These datasets are publicly available at http://www.gems-system.org, and http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.

References

  1. Ahmed, S., Zhang, M., Peng, L.: A new GP-based wrapper feature construction approach to classification and biomarker identification. In: IEEE Congress on Evolutionary Computation, pp. 2756–2763 (2014)

    Google Scholar 

  2. Al-Sahaf, H., Al-Sahaf, A., Xue, B., Johnston, M., Zhang, M.: Automatically evolving rotation-invariant texture image descriptors by genetic programming. IEEE Trans. Evol. Comput. 21(1), 83–101 (2016)

    Google Scholar 

  3. Bhanu, B., Krawiec, K.: Coevolutionary construction of features for transformation of representation in machine learning. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 249–254. Press (2002)

    Google Scholar 

  4. Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1, 300 (2007)

    Google Scholar 

  5. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)

    Article  Google Scholar 

  6. Espejo, P., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)

    Article  Google Scholar 

  7. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  8. Krawiec, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evol. Mach. 3, 329–343 (2002)

    Article  MATH  Google Scholar 

  9. Nag, K., Pal, N.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2016)

    Article  Google Scholar 

  10. Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)

    Article  Google Scholar 

  11. Tran, B., Zhang, M., Xue, B.: Multiple feature construction in classification on high-dimensional data using GP. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)

    Google Scholar 

  12. Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Comput. 8(1), 3–15 (2015)

    Article  Google Scholar 

  13. Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Trans. Neural Netw. 19(7), 1267–1278 (2008)

    Article  Google Scholar 

  14. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  15. Zhang, J., Wang, S., Chen, L., Gallinari, P.: Multiple Bayesian discriminant functions for high-dimensional massive data classification. Data Mining Knowl. Discovery 31(2), 1–37 (2016)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tran, B., Xue, B., Zhang, M. (2017). Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63004-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63003-8

  • Online ISBN: 978-3-319-63004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics