Abstract
Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell. ISBN 978-1-4615-5725-8
Cerrada M, Sanchez RV, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703
Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04082-3
Muharram M, Smith G (2005) Evolutionary constructive induction. IEEE Trans Knowl Data Eng 17(11):1518–1528
Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256
Bennett1 KP, Kunapuli1 G, Hu1 J, Pang J-S (2008) Bilevel optimization and machine learning. In: Proceedings of the IEEE world congress on computational intelligence, pp 25–47
Xue B, Zhang M, Browne WN, Yao X (2016) A Survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Vergara J, Estévez P (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186
Canuto AMP, Nascimento DSC (2012) A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8
Zhu ZX, Ong Y-S, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern B 37(1):70–76
Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter reranking. Knowl-Based Syst 25(1):35–44
Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04171-3
He J, Bi Y, Ding L, Li Z, Wang S (2016) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):1–13
Shannon C, Weaver W (1948) The mathematical theory of communication, 144. The University of Illinois Press, Champaign ISBN:978-0252725487
Kamath U, De Jong K, Shehu A (2014) Effective automated feature construction and selection for classification of biological sequences. PLoS ONE 9(7):e99982
Ahmed S, Zhang M, Peng L (2014) A new gp-based wrapper feature construction approach to classification and biomarker identification. In: Proceedings of the IEEE congress on evolutionary computation, pp 2756–2763
Tran B, Zhang M, Xue B (2016) Multiple feature construction in classification on high-dimensional data using GP. In: Proceedings of IEEE symposium series on computational intelligence, pp 1–8
Hammami M, Bechikh S, Hung C-C, Ben Said L (2018) A multi-objective hybrid filter-wrapper evolutionary approach for feature construction on high-dimensional data. In: Proceedings of IEEE congress on evolutionary computation, pp 1–8
Sahin D, Kessentini M, Bechikh S, Deb K (2014) Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol 24(1):1–44
Hammami M, Bechikh S, Hung C-C, Ben Said L (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208
Chaabani A, Bechikh S, Ben Said L (2015) A co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization. In: Proceedings of IEEE congress on evolutionary computation, pp 1659–1666
Patterson G, Zhang M (2007) Fitness functions in genetic programming for classification with unbalanced data. In: Proceedings of advances in artificial intelligence, pp 769–775
Arora JS (2017) Introduction to optimum design. Academic Press. ISBN: 9780128009185
Frank A, Asuncion A (2010) UCI machine learning repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: IEEE bioinformatics conference, pp 523–528
Gallo CA, Cecchini RL, Carballido JA, Micheletto S, Ponzoni I (2015) Discretization of gene expression data revised. Brief Bioinform 17(5):758–770
Tran B, Xue B, Zhang M (2015) Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput 8(1):3–15
Tran B, Xue B, Zhang M (2019) Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognit 93(1):404–417
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Eiben AE, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31
Ross PJ (1996) Taguchi techniques for quality engineering: loss function, orthogonal experiments, parameter and tolerance design. McGraw Hill Professional, New York. ISBN: 978-0070538665
Phadke MS (1995) Quality engineering using robust design. Prentice Hall PTR. ISBN: 978-0137451678
Butler-Yeoman T, Xue B, Zhang M (2015) Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: Proceedings of the IEEE congress on evolutionary computation, pp 2428–2435
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York ISBN:9780470316801
Brock G, Pihur V, Datta S, Datta S (2008) clValid: an R package for cluster validation. J Stat Softw 25(4):1–22
Peralta D, Río SD, Ramírez-Gallego S, Triguero I, Benítez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a MapReduce approach. Math Probl Eng 2015(1):1–11
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
As we have mentioned in Sect. 3, the main goal of CODBA [23] is to reduce the high computational cost of the lower-level task by using decomposition, coevolution, and multi-threading. Originally, CODBA was proposed to solve a bi-level production–distribution problem in supply chain management. Its basic scheme is illustrated in Fig. 8, and its step-by-step procedure is described in subsections A.1 and A.2. As BFC-GA is an adapted version of CODBA for the bi-level feature construction problem, we highlight in Table 8 the main differences between these two algorithms.
1.1 A.1 CODBA upper-level main principle
The step-by-step upper-level procedure of the CODBA is described as follows:
Step 1 (Initialization scheme): Generate an initial parent population of N individuals randomly with upper-level variables. Thereafter, the lower-level optimization problem is executed to identify the optimal lower-level solutions. In fact, the upper-level fitness is assigned based on both upper-level function value and constraints since the lower-level problem appears as constraint to the upper-level one.
Step 2 (Upper-level parent selection): Choose (N/2) population members from the parent population using tournament selection.
Step 3 (Variation at the upper level): Perform crossover and mutation operators in order to create the offspring population.
Step 4 (Lower-level optimization): Solve the lower-level optimization problem for each offspring using the decomposition-based coevolutionary parallel scheme (cf. the following subsection).
Step 5 (Offspring evaluation): Combine both the upper-level parents and the upper-level children into an \(R_t\) population and evaluate them based on the upper-level objective function and the constraints.
Step 6: (Environmental selection): Fill the new upper-level population using a replacement strategy. The new upper-level population is formed with the N best solutions of \(R_t\). If the stopping criterion is met, then return the best upper-level solution; otherwise, return to Step 2.
1.2 A.2 CODBA lower-level main principle
In order to cope with the high computational cost, the lower-level population is decomposed into M well-distributed subpopulations over the low-level search space. Each subpopulation could be seen as a cluster of lower-level solutions so that the clusters’ centroids are well distributed to cover as possible the whole search space. In this way, each subpopulation is responsible for a specific region. All subpopulations coevolve in parallel using M threads (one thread for each subpopulation). Besides, for each subpopulation, its best solution is stored into an archive. The obtained best solutions are subsequently exchanged between the different subpopulations via crossover. In fact, making recombination with the archive solutions allows information exchange between the different subpopulations with the aim to find better lower-level global optimum. The step-by-step procedure of the lower-level optimization algorithm is described as follows:
Step 1: (Lower-level decomposition): For each upper-level solution, M well-distributed lower-level subpopulations are generated on the whole discrete decision space. To do so, a new decomposition method for discrete decision spaces is proposed that is described in section III-B of [23]. Once these subpopulations are generated, each subpopulation member is evaluated using the lower-level objective function and constraints. It is important to note that all subpopulations are to be evolved simultaneously using a thread for each subpopulation.
Step 2 (Lower-level parent selection): The number of population members is chosen as (SPS/2) from each lower-level parent subpopulation using tournament selection where SPS is the subpopulation size.
Step 3 (Variation at the lower level): Perform the crossover and mutation operations in order to create an offspring subpopulation for each parent subpopulation.
Step 4 (Offspring evaluation): Combine each parent subpopulation with its corresponding offspring population and evaluate them using the lower-level objective function and the constraints.
Step 5 (Environmental selection): Fill each new lower-level subpopulation using a replacement strategy. In fact, each new lower-level subpopulation is formed with the SPS best solutions of the combined one. If the stopping criterion is met, then store the best found lower-level solution in the archive; otherwise, return to Step 2.
Step 6 (Coevolution): Each subpopulation member is crossed over with one of the best archive members of the other subpopulations. In this way, an offspring population is obtained for each subpopulation. Thereafter, each subpopulation is combined with its corresponding offspring population and the subpopulation is updated by selecting the best SPS ones. This process is repeated until the best lower-level fitness function value is no more improved for a K generations or MaxGenCoEvol is attained where MaxGenCoEvol is the maximum allowed number of generations for coevolution. Once the coevolution is terminated, the global optimum is returned to the upper level to evaluate upper-level solutions.
Basic idea of CODBA [23]
1.3 A.3 Main differences between BFC-GA and CODBA
A comparison between BFC-GA and CODBA is presented in Table 8.
Appendix B
The orthogonal array \(L_{27}(3^8)\) corresponding to 27 experiments, eight variables, and three levels is presented in Table 9. The experimental results of the constructed, the selected, and the selected and constructed features are presented in Tables 10, 11, and 12, respectively.
Rights and permissions
About this article
Cite this article
Hammami, M., Bechikh, S., Louati, A. et al. Feature construction as a bi-level optimization problem. Neural Comput & Applic 32, 13783–13804 (2020). https://doi.org/10.1007/s00521-020-04784-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04784-z