Feature construction as a bi-level optimization problem

Hammami, Marwa; Bechikh, Slim; Louati, Ali; Makhlouf, Mohamed; Said, Lamjed Ben

doi:10.1007/s00521-020-04784-z

Feature construction as a bi-level optimization problem

Original Article
Published: 22 February 2020

Volume 32, pages 13783–13804, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Marwa Hammami¹,
Slim Bechikh^1,3,
Ali Louati²,
Mohamed Makhlouf⁴ &
…
Lamjed Ben Said¹

684 Accesses
14 Citations
Explore all metrics

Abstract

Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Article 01 March 2019

Tareq Abed Mohammed, Oguz Bayat, … Shaymaa Alhayali

Feature Subset Selection Approach by Gray-Wolf Optimization

Notes

http://www.minitab.com/en-us/.

References

Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13
Article MATH Google Scholar
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell. ISBN 978-1-4615-5725-8
Cerrada M, Sanchez RV, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703
Article Google Scholar
Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04082-3
Article Google Scholar
Muharram M, Smith G (2005) Evolutionary constructive induction. IEEE Trans Knowl Data Eng 17(11):1518–1528
Article Google Scholar
Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661
Article Google Scholar
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256
Article MathSciNet MATH Google Scholar
Bennett1 KP, Kunapuli1 G, Hu1 J, Pang J-S (2008) Bilevel optimization and machine learning. In: Proceedings of the IEEE world congress on computational intelligence, pp 25–47
Xue B, Zhang M, Browne WN, Yao X (2016) A Survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Article Google Scholar
Vergara J, Estévez P (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186
Article Google Scholar
Canuto AMP, Nascimento DSC (2012) A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8
Zhu ZX, Ong Y-S, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern B 37(1):70–76
Article Google Scholar
Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter reranking. Knowl-Based Syst 25(1):35–44
Article Google Scholar
Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04171-3
Article Google Scholar
He J, Bi Y, Ding L, Li Z, Wang S (2016) Unsupervised feature selection based on decision graph. Neural Comput Appl 28(10):1–13
Google Scholar
Shannon C, Weaver W (1948) The mathematical theory of communication, 144. The University of Illinois Press, Champaign ISBN:978-0252725487
Google Scholar
Kamath U, De Jong K, Shehu A (2014) Effective automated feature construction and selection for classification of biological sequences. PLoS ONE 9(7):e99982
Article Google Scholar
Ahmed S, Zhang M, Peng L (2014) A new gp-based wrapper feature construction approach to classification and biomarker identification. In: Proceedings of the IEEE congress on evolutionary computation, pp 2756–2763
Tran B, Zhang M, Xue B (2016) Multiple feature construction in classification on high-dimensional data using GP. In: Proceedings of IEEE symposium series on computational intelligence, pp 1–8
Hammami M, Bechikh S, Hung C-C, Ben Said L (2018) A multi-objective hybrid filter-wrapper evolutionary approach for feature construction on high-dimensional data. In: Proceedings of IEEE congress on evolutionary computation, pp 1–8
Sahin D, Kessentini M, Bechikh S, Deb K (2014) Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol 24(1):1–44
Article Google Scholar
Hammami M, Bechikh S, Hung C-C, Ben Said L (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208
Article Google Scholar
Chaabani A, Bechikh S, Ben Said L (2015) A co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization. In: Proceedings of IEEE congress on evolutionary computation, pp 1659–1666
Patterson G, Zhang M (2007) Fitness functions in genetic programming for classification with unbalanced data. In: Proceedings of advances in artificial intelligence, pp 769–775
Arora JS (2017) Introduction to optimum design. Academic Press. ISBN: 9780128009185
Frank A, Asuncion A (2010) UCI machine learning repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: IEEE bioinformatics conference, pp 523–528
Gallo CA, Cecchini RL, Carballido JA, Micheletto S, Ponzoni I (2015) Discretization of gene expression data revised. Brief Bioinform 17(5):758–770
Article Google Scholar
Tran B, Xue B, Zhang M (2015) Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput 8(1):3–15
Article Google Scholar
Tran B, Xue B, Zhang M (2019) Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognit 93(1):404–417
Article Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
Article Google Scholar
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Article Google Scholar
Eiben AE, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm Evol Comput 1(1):19–31
Article Google Scholar
Ross PJ (1996) Taguchi techniques for quality engineering: loss function, orthogonal experiments, parameter and tolerance design. McGraw Hill Professional, New York. ISBN: 978-0070538665
Phadke MS (1995) Quality engineering using robust design. Prentice Hall PTR. ISBN: 978-0137451678
Butler-Yeoman T, Xue B, Zhang M (2015) Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: Proceedings of the IEEE congress on evolutionary computation, pp 2428–2435
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York ISBN:9780470316801
Book MATH Google Scholar
Brock G, Pihur V, Datta S, Datta S (2008) clValid: an R package for cluster validation. J Stat Softw 25(4):1–22
Article Google Scholar
Peralta D, Río SD, Ramírez-Gallego S, Triguero I, Benítez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a MapReduce approach. Math Probl Eng 2015(1):1–11
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

SMART lab, University of Tunis, ISG-Campus, Tunis, Tunisia
Marwa Hammami, Slim Bechikh & Lamjed Ben Said
Information Systems Department, Prince Sattam bin Abdulaziz University, Alkharj, 11942, Kingdom of Saudi Arabia
Ali Louati
LMVSR, Kennesaw State University, Kennesaw, GA, USA
Slim Bechikh
Kedge Business School, Talence, France
Mohamed Makhlouf

Authors

Marwa Hammami
View author publications
You can also search for this author in PubMed Google Scholar
Slim Bechikh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Louati
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Makhlouf
View author publications
You can also search for this author in PubMed Google Scholar
Lamjed Ben Said
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Slim Bechikh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

As we have mentioned in Sect. 3, the main goal of CODBA [23] is to reduce the high computational cost of the lower-level task by using decomposition, coevolution, and multi-threading. Originally, CODBA was proposed to solve a bi-level production–distribution problem in supply chain management. Its basic scheme is illustrated in Fig. 8, and its step-by-step procedure is described in subsections A.1 and A.2. As BFC-GA is an adapted version of CODBA for the bi-level feature construction problem, we highlight in Table 8 the main differences between these two algorithms.

1.1 A.1 CODBA upper-level main principle

The step-by-step upper-level procedure of the CODBA is described as follows:

Step 1 (Initialization scheme): Generate an initial parent population of N individuals randomly with upper-level variables. Thereafter, the lower-level optimization problem is executed to identify the optimal lower-level solutions. In fact, the upper-level fitness is assigned based on both upper-level function value and constraints since the lower-level problem appears as constraint to the upper-level one.

Step 2 (Upper-level parent selection): Choose (N/2) population members from the parent population using tournament selection.

Step 3 (Variation at the upper level): Perform crossover and mutation operators in order to create the offspring population.

Step 4 (Lower-level optimization): Solve the lower-level optimization problem for each offspring using the decomposition-based coevolutionary parallel scheme (cf. the following subsection).

Step 5 (Offspring evaluation): Combine both the upper-level parents and the upper-level children into an \(R_t\) population and evaluate them based on the upper-level objective function and the constraints.

Step 6: (Environmental selection): Fill the new upper-level population using a replacement strategy. The new upper-level population is formed with the N best solutions of \(R_t\). If the stopping criterion is met, then return the best upper-level solution; otherwise, return to Step 2.

1.2 A.2 CODBA lower-level main principle

In order to cope with the high computational cost, the lower-level population is decomposed into M well-distributed subpopulations over the low-level search space. Each subpopulation could be seen as a cluster of lower-level solutions so that the clusters’ centroids are well distributed to cover as possible the whole search space. In this way, each subpopulation is responsible for a specific region. All subpopulations coevolve in parallel using M threads (one thread for each subpopulation). Besides, for each subpopulation, its best solution is stored into an archive. The obtained best solutions are subsequently exchanged between the different subpopulations via crossover. In fact, making recombination with the archive solutions allows information exchange between the different subpopulations with the aim to find better lower-level global optimum. The step-by-step procedure of the lower-level optimization algorithm is described as follows:

Step 1: (Lower-level decomposition): For each upper-level solution, M well-distributed lower-level subpopulations are generated on the whole discrete decision space. To do so, a new decomposition method for discrete decision spaces is proposed that is described in section III-B of [23]. Once these subpopulations are generated, each subpopulation member is evaluated using the lower-level objective function and constraints. It is important to note that all subpopulations are to be evolved simultaneously using a thread for each subpopulation.

Step 2 (Lower-level parent selection): The number of population members is chosen as (SPS/2) from each lower-level parent subpopulation using tournament selection where SPS is the subpopulation size.

Step 3 (Variation at the lower level): Perform the crossover and mutation operations in order to create an offspring subpopulation for each parent subpopulation.

Step 4 (Offspring evaluation): Combine each parent subpopulation with its corresponding offspring population and evaluate them using the lower-level objective function and the constraints.

Step 5 (Environmental selection): Fill each new lower-level subpopulation using a replacement strategy. In fact, each new lower-level subpopulation is formed with the SPS best solutions of the combined one. If the stopping criterion is met, then store the best found lower-level solution in the archive; otherwise, return to Step 2.

Step 6 (Coevolution): Each subpopulation member is crossed over with one of the best archive members of the other subpopulations. In this way, an offspring population is obtained for each subpopulation. Thereafter, each subpopulation is combined with its corresponding offspring population and the subpopulation is updated by selecting the best SPS ones. This process is repeated until the best lower-level fitness function value is no more improved for a K generations or MaxGenCoEvol is attained where MaxGenCoEvol is the maximum allowed number of generations for coevolution. Once the coevolution is terminated, the global optimum is returned to the upper level to evaluate upper-level solutions.

1.3 A.3 Main differences between BFC-GA and CODBA

A comparison between BFC-GA and CODBA is presented in Table 8.

Table 8 Main differences between BFC-GA and CODBA

Full size table

Appendix B

The orthogonal array \(L_{27}(3^8)\) corresponding to 27 experiments, eight variables, and three levels is presented in Table 9. The experimental results of the constructed, the selected, and the selected and constructed features are presented in Tables 10, 11, and 12, respectively.

Table 9 The orthogonal array \(L_{27}\)

Full size table

Table 10 Best, average, and Std of the accuracy of the constructed features

Full size table

Table 11 Best, average, and Std of the accuracy of the selected features

Full size table

Table 12 Best, average, and Std of the accuracy of the selected and constructed features

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammami, M., Bechikh, S., Louati, A. et al. Feature construction as a bi-level optimization problem. Neural Comput & Applic 32, 13783–13804 (2020). https://doi.org/10.1007/s00521-020-04784-z

Download citation

Received: 08 May 2019
Accepted: 10 February 2020
Published: 22 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00521-020-04784-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Feature construction as a bi-level optimization problem

Abstract

Access this article

Similar content being viewed by others

Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Feature Subset Selection Approach by Gray-Wolf Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

1.1 A.1 CODBA upper-level main principle

1.2 A.2 CODBA lower-level main principle

1.3 A.3 Main differences between BFC-GA and CODBA

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature construction as a bi-level optimization problem

Abstract

Access this article

Similar content being viewed by others

Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Feature Subset Selection Approach by Gray-Wolf Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

1.1 A.1 CODBA upper-level main principle

1.2 A.2 CODBA lower-level main principle

1.3 A.3 Main differences between BFC-GA and CODBA

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation