Abstract
In the ensemble feature selection method, if the weight adjustment is performed on each feature subset used, the ensemble effect can be significantly different; therefore, how to find the optimized weight vector is a key and challenging problem. Aiming at this optimization problem, this paper proposes an ensemble feature selection approach based on genetic algorithm (EFS-BGA). After each base feature selector generates a feature subset, the EFS-BGA method obtains the optimized weight of each feature subset through genetic algorithm, which is different from traditional genetic algorithm directly processing single features. We divide the EFS-BGA algorithm into two types. The first is a complete ensemble feature selection method; based on the first, we further propose the selective EFS-BGA model. After that, through mathematical analysis, we theoretically explain why weight adjustment is an optimization problem and how to optimize. Finally, through the comparative experiments on multiple data sets, the advantages of the EFS-BGA algorithm in this paper over the previous ensemble feature selection algorithms are explained in practice.
Similar content being viewed by others
References
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879
Baraniuk RG (2007) Compressive sensing [lecture notes]. IEEE Signal Process Mag 24(4):118–121
Breiman L (1995) Better subset regression using the nonnegative garrote’. Technometrics 37(4):373–84
Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O et al (2013) Api design for machine learning software: experiences from the scikit-learn project. Eprint Arxiv
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl-Based Syst 123:116–127
Dua D, Graff C (2019) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Fortin F-A, De Rainville F-M, Gardner M-A, Marc P, Christian G (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13:2171–2175
Friedman JH (1997) On bias, variance, 0/1–loss, and the curse-of-dimensionality. Data Min Knowl Discov 1(1):55–77
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47
Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Professional, Reading, MA. ISBN 978-0201157673
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Huan L, Hiroshi M (2007) Computational methods of feature selection. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Taylor & Francis, New York. Chapman & Hall/CRC
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Jiang S, Chin KS, Wang L et al (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230
Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: National conference on artificial intelligence
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning
Lehmann EL, Casella G (1998) Theory of point estimation (2nd ed.). New York: Springer. ISBN 0-387-98502-6. MR 1639875
Liu H, Setiono R (1996) Feature selection and classification—a probabilistic wrapper approach. industrial and engineering applications of artificial intelligence and expert systems
Mitchell L, Sloan TM, Mewissen M et al (2014) Parallel classification and feature selection in microarray data using SPRINT. Concurr Comput Pract Exp 26(4):854–865
Qu EQ, Liu K, Zhang AL, Wang J, Sun H (2016) Feature selection of steel surface defect based on P-ReliefF method. In: 2016 35th Chinese control conference (CCC), control conference (CCC), 2016 35th Chinese. 2016:7164. https://doi.org/10.1109/ChiCC.2016.7554489.
Sadeghi J, Niaki STA, Malekian MR, Sadeghi S (2016) Optimising multi-item economic production quantity model with trapezoidal fuzzy demand and backordering: two tuned meta-heuristics. Eur J Ind Eng 10(2):170. https://doi.org/10.1504/ejie.2016.075847.ISSN 1751-5254
Saeys Y, Abeel T, Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 313–325
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seismograms. SIAM J Sci Stat Comput 7(4):1307–1330
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–88. JSTOR 2346178
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3(2003):1439–1461
Whitley D (1994) “A genetic algorithm tutorial” (PDF). Stat Comput 4(2):65–85. https://doi.org/10.1007/BF00175354
Willmott Cort J, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res 30:79–82. https://doi.org/10.3354/cr030079
Xue X, Yao M, Wu Z (2018) A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl Inf Syst 57(2):389–412
Acknowledgements
The funding was provided by NSFC (Grant Nos. U1509216, U1866602, 61472099, 61602129) and the National Key Research and Development Program of China (Grant No. 2016YFB1000703).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, H., He, C. & Li, Z. A new ensemble feature selection approach based on genetic algorithm. Soft Comput 24, 15811–15820 (2020). https://doi.org/10.1007/s00500-020-04911-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04911-x