Skip to main content

Advertisement

Log in

Feature construction as a bi-level optimization problem

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.minitab.com/en-us/.

References

  1. Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13

    Article  MATH  Google Scholar 

  2. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell. ISBN 978-1-4615-5725-8

  3. Cerrada M, Sanchez RV, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703

    Article  Google Scholar 

  4. Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04082-3

    Article  Google Scholar 

  5. Muharram M, Smith G (2005) Evolutionary constructive induction. IEEE Trans Knowl Data Eng 17(11):1518–1528

    Article  Google Scholar 

  6. Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661

    Article  Google Scholar 

  7. Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256

    Article  MathSciNet  MATH  Google Scholar 

  8. Bennett1 KP, Kunapuli1 G, Hu1 J, Pang J-S (2008) Bilevel optimization and machine learning. In: Proceedings of the IEEE world congress on computational intelligence, pp 25–47

  9. Xue B, Zhang M, Browne WN, Yao X (2016) A Survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  10. Vergara J, Estévez P (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186

    Article  Google Scholar 

  11. Canuto AMP, Nascimento DSC (2012) A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8

  12. Zhu ZX, Ong Y-S, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern B 37(1):70–76

    Article  Google Scholar 

  13. Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter reranking. Knowl-Based Syst 25(1):35–44

    Article  Google Scholar 

  14. Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04171-3

    Article  Google Scholar 

  15. He J, Bi Y, Ding L, Li Z, Wang S (2016) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):1–13

    Google Scholar 

  16. Shannon C, Weaver W (1948) The mathematical theory of communication, 144. The University of Illinois Press, Champaign ISBN:978-0252725487

    Google Scholar 

  17. Kamath U, De Jong K, Shehu A (2014) Effective automated feature construction and selection for classification of biological sequences. PLoS ONE 9(7):e99982

    Article  Google Scholar 

  18. Ahmed S, Zhang M, Peng L (2014) A new gp-based wrapper feature construction approach to classification and biomarker identification. In: Proceedings of the IEEE congress on evolutionary computation, pp 2756–2763

  19. Tran B, Zhang M, Xue B (2016) Multiple feature construction in classification on high-dimensional data using GP. In: Proceedings of IEEE symposium series on computational intelligence, pp 1–8

  20. Hammami M, Bechikh S, Hung C-C, Ben Said L (2018) A multi-objective hybrid filter-wrapper evolutionary approach for feature construction on high-dimensional data. In: Proceedings of IEEE congress on evolutionary computation, pp 1–8

  21. Sahin D, Kessentini M, Bechikh S, Deb K (2014) Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol 24(1):1–44

    Article  Google Scholar 

  22. Hammami M, Bechikh S, Hung C-C, Ben Said L (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208

    Article  Google Scholar 

  23. Chaabani A, Bechikh S, Ben Said L (2015) A co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization. In: Proceedings of IEEE congress on evolutionary computation, pp 1659–1666

  24. Patterson G, Zhang M (2007) Fitness functions in genetic programming for classification with unbalanced data. In: Proceedings of advances in artificial intelligence, pp 769–775

  25. Arora JS (2017) Introduction to optimum design. Academic Press. ISBN: 9780128009185

  26. Frank A, Asuncion A (2010) UCI machine learning repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html

  27. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: IEEE bioinformatics conference, pp 523–528

  28. Gallo CA, Cecchini RL, Carballido JA, Micheletto S, Ponzoni I (2015) Discretization of gene expression data revised. Brief Bioinform 17(5):758–770

    Article  Google Scholar 

  29. Tran B, Xue B, Zhang M (2015) Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput 8(1):3–15

    Article  Google Scholar 

  30. Tran B, Xue B, Zhang M (2019) Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognit 93(1):404–417

    Article  Google Scholar 

  31. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643

    Article  Google Scholar 

  32. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671

    Article  Google Scholar 

  33. Eiben AE, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31

    Article  Google Scholar 

  34. Ross PJ (1996) Taguchi techniques for quality engineering: loss function, orthogonal experiments, parameter and tolerance design. McGraw Hill Professional, New York. ISBN: 978-0070538665

  35. Phadke MS (1995) Quality engineering using robust design. Prentice Hall PTR. ISBN: 978-0137451678

  36. Butler-Yeoman T, Xue B, Zhang M (2015) Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: Proceedings of the IEEE congress on evolutionary computation, pp 2428–2435

  37. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18

    Article  Google Scholar 

  38. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York ISBN:9780470316801

    Book  MATH  Google Scholar 

  39. Brock G, Pihur V, Datta S, Datta S (2008) clValid: an R package for cluster validation. J Stat Softw 25(4):1–22

    Article  Google Scholar 

  40. Peralta D, Río SD, Ramírez-Gallego S, Triguero I, Benítez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a MapReduce approach. Math Probl Eng 2015(1):1–11

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Slim Bechikh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

As we have mentioned in Sect. 3, the main goal of CODBA [23] is to reduce the high computational cost of the lower-level task by using decomposition, coevolution, and multi-threading. Originally, CODBA was proposed to solve a bi-level production–distribution problem in supply chain management. Its basic scheme is illustrated in Fig. 8, and its step-by-step procedure is described in subsections A.1 and A.2. As BFC-GA is an adapted version of CODBA for the bi-level feature construction problem, we highlight in Table 8 the main differences between these two algorithms.

1.1 A.1 CODBA upper-level main principle

The step-by-step upper-level procedure of the CODBA is described as follows:

Step 1 (Initialization scheme): Generate an initial parent population of N individuals randomly with upper-level variables. Thereafter, the lower-level optimization problem is executed to identify the optimal lower-level solutions. In fact, the upper-level fitness is assigned based on both upper-level function value and constraints since the lower-level problem appears as constraint to the upper-level one.

Step 2 (Upper-level parent selection): Choose (N/2) population members from the parent population using tournament selection.

Step 3 (Variation at the upper level): Perform crossover and mutation operators in order to create the offspring population.

Step 4 (Lower-level optimization): Solve the lower-level optimization problem for each offspring using the decomposition-based coevolutionary parallel scheme (cf. the following subsection).

Step 5 (Offspring evaluation): Combine both the upper-level parents and the upper-level children into an \(R_t\) population and evaluate them based on the upper-level objective function and the constraints.

Step 6: (Environmental selection): Fill the new upper-level population using a replacement strategy. The new upper-level population is formed with the N best solutions of \(R_t\). If the stopping criterion is met, then return the best upper-level solution; otherwise, return to Step 2.

1.2 A.2 CODBA lower-level main principle

In order to cope with the high computational cost, the lower-level population is decomposed into M well-distributed subpopulations over the low-level search space. Each subpopulation could be seen as a cluster of lower-level solutions so that the clusters’ centroids are well distributed to cover as possible the whole search space. In this way, each subpopulation is responsible for a specific region. All subpopulations coevolve in parallel using M threads (one thread for each subpopulation). Besides, for each subpopulation, its best solution is stored into an archive. The obtained best solutions are subsequently exchanged between the different subpopulations via crossover. In fact, making recombination with the archive solutions allows information exchange between the different subpopulations with the aim to find better lower-level global optimum. The step-by-step procedure of the lower-level optimization algorithm is described as follows:

Step 1: (Lower-level decomposition): For each upper-level solution, M well-distributed lower-level subpopulations are generated on the whole discrete decision space. To do so, a new decomposition method for discrete decision spaces is proposed that is described in section III-B of [23]. Once these subpopulations are generated, each subpopulation member is evaluated using the lower-level objective function and constraints. It is important to note that all subpopulations are to be evolved simultaneously using a thread for each subpopulation.

Step 2 (Lower-level parent selection): The number of population members is chosen as (SPS/2) from each lower-level parent subpopulation using tournament selection where SPS is the subpopulation size.

Step 3 (Variation at the lower level): Perform the crossover and mutation operations in order to create an offspring subpopulation for each parent subpopulation.

Step 4 (Offspring evaluation): Combine each parent subpopulation with its corresponding offspring population and evaluate them using the lower-level objective function and the constraints.

Step 5 (Environmental selection): Fill each new lower-level subpopulation using a replacement strategy. In fact, each new lower-level subpopulation is formed with the SPS best solutions of the combined one. If the stopping criterion is met, then store the best found lower-level solution in the archive; otherwise, return to Step 2.

Step 6 (Coevolution): Each subpopulation member is crossed over with one of the best archive members of the other subpopulations. In this way, an offspring population is obtained for each subpopulation. Thereafter, each subpopulation is combined with its corresponding offspring population and the subpopulation is updated by selecting the best SPS ones. This process is repeated until the best lower-level fitness function value is no more improved for a K generations or MaxGenCoEvol is attained where MaxGenCoEvol is the maximum allowed number of generations for coevolution. Once the coevolution is terminated, the global optimum is returned to the upper level to evaluate upper-level solutions.

Fig. 8
figure 8

Basic idea of CODBA [23]

1.3 A.3 Main differences between BFC-GA and CODBA

A comparison between BFC-GA and CODBA is presented in Table 8.

Table 8 Main differences between BFC-GA and CODBA

Appendix B

The orthogonal array \(L_{27}(3^8)\) corresponding to 27 experiments, eight variables, and three levels is presented in Table 9. The experimental results of the constructed, the selected, and the selected and constructed features are presented in Tables 10, 11, and 12, respectively.

Table 9 The orthogonal array \(L_{27}\)
Table 10 Best, average, and Std of the accuracy of the constructed features
Table 11 Best, average, and Std of the accuracy of the selected features
Table 12 Best, average, and Std of the accuracy of the selected and constructed features

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammami, M., Bechikh, S., Louati, A. et al. Feature construction as a bi-level optimization problem. Neural Comput & Applic 32, 13783–13804 (2020). https://doi.org/10.1007/s00521-020-04784-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04784-z

Keywords

Navigation