Abstract
This paper addresses the problem of feature selection aiming to improve a flood forecasting model. The proposed model is carried out through a case study that uses 18 different time series of thirty-five years of hydrological data, forecasting the level of the Xingu River, in the Amazon rainforest in Brazil. We employ a Genetic Algorithm for the task of feature selection and exploit several different genetic parameters seeking to improve the accuracy of the prediction. The features selected by the Genetic Algorithm are used as input of a Linear Regression model that performs the forecasting. A statistical analysis verifies that the final model can predict the river level with high accuracy, which obtains a coefficient of determination equal to 0.988. Hence, the proposed Genetic Algorithm showed to be successful in selecting the most relevant features.
Similar content being viewed by others
Notes
Equation for the Coefficient of Determination:
-
1.
\(R^{2} = 1 - \frac {{\sum }_{i=1}^{n}(y_{true}- y_{pred})^{2}}{{\sum }_{i=1}^{n}(y_{true} - \bar {y})^{2}}\), where ytrue is the data set, ypred is the prediction, \(\bar {y}\) is the average of y, and n is number of the observations.
-
1.
Equation for the Root Mean Square Error:
-
2.
\(RMSE = \sqrt { \frac {1}{n} {\sum }_{i=i}^{n} (y_{true} - y_{pred})^{2}}\)
-
2.
Equation for the Mean Absolute Error:
-
3.
\(MAE = \frac {1}{n} {\sum }_{i=i}^{n} |y_{true} - y_{pred}|\)
-
3.
References
Bhandari D, Murthy CA (1996) Genetic algorithm with elitist model and its convergence. IJPRAI 10(6):731–747
Chen ST, Yu PS (2007) Pruning of support vector networks on flood forecasting. J Hydrol 347(1):67–78
de Lucena DV, de Lima TW, Soares AS, Coelho CJ (2012) Multi-objective evolutionary algorithm nsga-ii for variables selection in multivariate calibration problems. Int J Natural Comput Res 3:43–58
de Oliveira LL, Freitas AA, Tinós R. (2018) Multi-objective genetic algorithms in the study of the genetic code’s adaptability. Inf Sci 425:48–61
de Paula TI (2015) Avaliação da influência de parêmetros do algoritmo genético na otimização de um problema multiobjetivo utilizando-se arranjo de misturas. Master’s thesis, PPGEP, Univesidade Federal de Itajubá
Dornelles F, Goldenfum JA, Pedrollo OC (2013) Artificial neural network methods applied to forecasting river levels. Revista Brasileira de Recursos Hídricos 18:45–54
Eiben AE, Schippers CA (1998) On evolutionary exploration and exploitation. Fundamenta Informaticae 35(1-4):35–50
EM-DAT (2016) The international disaster database. Emdat Advanced Search. Available at www.emdat.be/advanced_search/index.html
Francescomarino CD, Dumas M, Federici M, Ghidini C, Maggi FM, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74(Part):67–83
Franco VS (2014) Previsao hidrológica de cheia sazonal do rio xingu em altamira-pa. Master’s thesis, PPGCA, Universidade Federal do Pará
Furquim G, Pessin G, Faiçal BS, Mendiondo EM, Ueyama J (2016) Improving the accuracy of a flood forecasting model by means of machine learning and chaos theory. Neural Comput & Applic 27 (5):1129–1141
Galelli S, Castelletti A (2013) Tree-based iterative input variable selection for hydrological modeling. Water Resour Res 49(7): 4295–4310
Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51
Gavriilidis A, Velten J, Tilgner S, Kummert A (2018) Machine learning for people detection in guidance functionality of enabling health applications by means of cascaded SVM classifiers. J Franklin Institute 355(4):2009–2021
Gonçalves VP, Giancristofaro GT, Geraldo Filho P, Johnson T, Carvalho V, Pessin G, de Almeida Neris VP, Ueyama J (2016) Assessing users emotion at interaction time: a multimodal approach with multiple sensors. Soft Comput 21(18): 5309–5323
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn 3:1157–1182
Haddad K, Rahman A (2020) Regional flood frequency analysis: evaluation of regions in cluster space using support vector regression. Nat Hazards 102:489–517
Hall MA (1999) Correlation-based feature selection for machine learning
Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press
IPCC (2013) Climate change 2013: the physical science basis. contribution of working group I to the fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge
Jing M, Jie Y, Shou-yi L, Lu W (2018) Application of fuzzy analytic hierarchy process in the risk assessment of dangerous small-sized reservoirs. Int J Mach Learn Cybern 9(1):113–123
Khaji E, Mohammadi AS (2014) A heuristic method to generate better initial population for evolutionary methods. CoRR arXiv:1406.4518
Linden R (2012) Algoritmo genetico editora ciencia mordena
Mokadem D, Amine A, Elberrichi Z, Helbert D (2018) Detection of urban areas using genetic algorithms and kohonen maps on multispectral images. IJOCI 8(1):46–62
Montgomery DC (2013) Design and analysis of experiments, 8th edn. Wiley, New York
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Pfafstetter O (1989) Classificação de bacias hidrográficas - Metodologia de classificação Departamento Nacional de Obras de Saneamento (RJ)
Rahnamayan S, Tizhoosh HR, Salama MMA (2007) A novel population initialization method for accelerating evolutionary algorithms. Comput Math Applic 53(10):1605–1614
Rocha EJP, Rolim PAM, Santos DM (2007) Modelo estatístico hidroclimático para previsão de níveis em Altamira-PA. In: XVII Simpósio brasileiro de recursos hídricos
Rodrigues MM, Costa MGF, Filho CFFC (2015) Proposta de um método para previsão de cheias sazonais utilizando redes neurais artificiais: Uma aplicação no rio amazonas. In: Workshop de computação aplicada a gestão do meio ambiente e recursos naturais (WCAMA)
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591–611
Silva B, Netto MAS, Cunha RLF (2018) Jobpruner: a machine learning assistant for exploring parameter spaces in HPC applications. Future Gen Comp Sys 83:144–157
Souza F, Araújo R (2011) Variable and time-lag selection using empirical data. In: IEEE 16th conference on emerging technologies & factory automation, ETFA 2011, pp 1–8
Sumbana MIM, Silva AJC, Gonçalves MA, Almeida JM, Pappa GL (2012) Seleção de atributos utilizando algoritmos genéticos para detecção do vandalismo na wikipedia. In: XXVII Simpósio brasileiro de banco de dados - short papers, São Paulo, São Paulo, Brasil, October 15-18, 2012, pp 209–216
Thomas JM (2017) Complex network embedding in the hyperbolic space using non-linear unsupervised machine learning techniques. Ph.D. thesis, Dresden University of Technology, Germany
Tran H, Muttil N, Perera B (2015) Selection of significant input variables for time series forecasting. Environmental Modelling & Software 64(C):156–163
Ueyama J, Faiçal BS, Mano LY, Bayer G, Pessin G, Gomes PH (2017) Enhancing reliability in wireless sensor networks for adaptive river monitoring systems: reflections on their long-term deployment in Brazil. Computers, Environment and Urban Systems 65:41–52
UFSC (2013) Atlas Brasileiro de Desastres Naturais: 1991 a 2012. Centro Universitario de Estudos e Pesquisa sobre Desastres. Universidade Federal de Santa Catarina
Wu J, Liu H, Wei G, Song T, Zhang C, Zhou H (2019) Flash flood forecasting using support vector regression model in a small mountainous catchment. Water 11:1327
Acknowledgments
The authors would like to thank the following colleagues due to help revising the manuscript and providing ideas to its best organization: Bruno S. Faiçal, Leandro Y. Mano, Vinícius Gonçalves and Pedro H. Gomes. The authors would like also to thank Márcio Nirlando Gomes Lopes due to his help in the development of Figure 3. Dr. J. Ueyama would like to acknowledge FAPESP, process 2018/17335-9.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vieira, A.C., Garcia, G., Pabón, R.E.C. et al. Improving flood forecasting through feature selection by a genetic algorithm – experiments based on real data from an Amazon rainforest river. Earth Sci Inform 14, 37–50 (2021). https://doi.org/10.1007/s12145-020-00528-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-020-00528-8