Abstract
Several previous works have shown how using prior knowledge within machine learning models helps to overcome the curse of dimensionality issue in high dimensional settings. However, most of these works are based on simple linear models (or variations) or do make the assumption of knowing a pre-defined variable grouping structure in advance, something that will not always be possible. This paper presents a hybrid genetic algorithm and machine learning approach which aims to learn variables grouping structure during the model estimation process, thus taking advantage of the benefits introduced by models based on problem-specific information but with no requirement of having a priory any information about variables structure. This approach has been tested on four synthetic datasets and its performance has been compared against two well-known reference models (LASSO and Group-LASSO). The results of the analysis showed how that the proposed approach, called GAGL, considerably outperformed LASSO and performed as well as Group-LASSO in high dimensional settings, with the added benefit of learning the variables grouping structure from data instead of requiring this information a priory before estimating the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antoniadis, A., Fan, J.: Regularization of wavelet approximations. J. Am. Stat. Assoc. 96(455), 939–967 (2001)
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York (1996)
Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Stat. Interface 2, 369–380 (2009)
Dorronsoro, B., Ruiz, P., Danoy, G., Pigné, Y., Bouvry, P.: Evolutionary Algorithms for Mobile Ad hoc Networks. Wiley, Hoboken (2014)
Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995 , vol. 2, pp. 1137–1143 (1995)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Luque-Baena, R., Urda, D., Claros, M.G., Franco, L., Jerez, J.: Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J. Biomed. Inf. 49, 32–44 (2014)
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. Roy. Stat. Soc. Series B (Stat. Methodol.) 70(1), 53–71 (2008)
Noorian, F., de Silva, A.M., Leong, P.H.W.: gramEvol: Grammatical evolution in R. J. Stat. Softw. 71(1), 1–26 (2016). https://doi.org/10.18637/jss.v071.i01
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Spears, W.M., De Jong, K.A., Bäck, T., Fogel, D.B., de Garis, H.: An overview of evolutionary computation. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 442–459. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56602-3_163
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Series B (Methodol.) 58(1), 267–288 (1996)
Urda, D., et al.: BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst. Biol. 12(5), 361–372 (2018)
Urda, D., Jerez, J.M., Turias, I.J.: Data dimension and structure effects in predictive performance of deep neural networks. In: New Trends in Intelligent Software Methodologies, Tools and Techniques, pp. 361–372 (2018)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. Series B (Stat. Methodol.) 68(1), 49–67 (2006)
Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018)
Acknowledgments
Authors acknowledge support through grants RTI2018-098160-B-I00 and RTI2018-100754-B-I00 from the Spanish Ministerio de Ciencia, Innovación y Universidades, which include ERDF funds, and from project 202C1800003 (UIC Airbus).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nimo, D., Dorronsoro, B., Turias, I.J., Urda, D. (2020). Learning Variables Structure Using Evolutionary Algorithms to Improve Predictive Performance. In: Dorronsoro, B., Ruiz, P., de la Torre, J., Urda, D., Talbi, EG. (eds) Optimization and Learning. OLA 2020. Communications in Computer and Information Science, vol 1173. Springer, Cham. https://doi.org/10.1007/978-3-030-41913-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-41913-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41912-7
Online ISBN: 978-3-030-41913-4
eBook Packages: Computer ScienceComputer Science (R0)