Abstract
Relative humidity (RH) is one of the important processes in the hydrology cycle which is highly stochastic. Accurate RH prediction can be highly beneficial for several water resources engineering practices. In this study, extreme gradient boosting (XGBoost) approach “as a selective input parameter” was coupled with support vector regression, random forest (RF), and multivariate adaptive regression spline (MARS) models for simulating the RH process. Meteorological data at two stations (Kut and Mosul), located in Iraq region, were selected as a case study. Numeric and graphic indicators were used for model’s evaluation. In general, all models revealed good prediction performance. In addition, research finding approved the importance of all the meteorological data for the RH simulation. Further, the integration of the XGBoost approach managed to abstract the essential parameters for the RH simulation at both stations and attained good predictability with less input parameters. At Kut station, RF model attained the best prediction results with minimum root mean square error (RMSE = 4.92) and mean absolute error (MAE = 3.89) using maximum air temperature and evaporation parameters. Whereas MARS model reported the best prediction results at Mosul station using all the utilized climate parameters with minimum (RMSE = 3.80 and MAE = 2.86). Overall, the research results evidenced the capability of the proposed coupled machine learning models for modeling the RH at different coordinates within a semi-arid environment.
Similar content being viewed by others
Data availability
Data available upon request from the authors.
References
Nguyen JL, Schwartz J, Dockery DW (2014) The relationship between indoor and outdoor temperature, apparent temperature, relative humidity, and absolute humidity. Indoor Air. https://doi.org/10.1111/ina.12052
Nam SW, Shin HH, Seo DU (2014) Comparative analysis of weather data for heating and cooling load calculation in greenhouse environmental design. Prot Hortic Plant Fact. https://doi.org/10.12791/ksbec.2014.23.3.174
Omid M, Shafaei A (2005) Temperature and relative humidity changes inside greenhouse. Int Agrophys 19(1):153–158
Serrano-Arellano J, Belman-Flores JM, Hernández-Pérez I et al (2020) Numerical study of the distribution of temperatures and relative humidity in a ventilated room located in warm weather. C Comput Model Eng Sci. https://doi.org/10.32604/cmes.2020.08677
Laurence H, Fabry F, Dutilleul P et al (2002) Estimation of the spatial pattern of surface relative humidity using ground based radar measurements and its application to disease risk assessment. Agric For Meteorol. https://doi.org/10.1016/S0168-1923(02)00019-9
Pierrehumbert RT, Brogniez H, Roca R (2007) On the relative humidity of the atmosphere. Glob Circ Atmos. https://doi.org/10.1172/JCI44005.es
Romps DM (2014) An analytical model for tropical relative humidity. J Clim 27:7432–7449. https://doi.org/10.1175/JCLI-D-14-00255.1
Tabari H, Hosseinzadeh Talaee P (2011) Analysis of trends in temperature data in arid and semi-arid regions of Iran. Glob Planet Change. https://doi.org/10.1016/j.gloplacha.2011.07.008
Trenberth KE, Fasullo J, Smith L (2005) Trends and variability in column-integrated atmospheric water vapor. Clim Dyn. https://doi.org/10.1007/s00382-005-0017-4
Prasad R, Ali M, Kwan P, Khan H (2019) Designing a multi-stage multivariate empirical mode decomposition coupled with ant colony optimization and random forest model to forecast monthly solar radiation. Appl Energy 236:778–792. https://doi.org/10.1016/j.apenergy.2018.12.034
Zhou TJ, Yu RC (2005) Atmospheric water vapor transport associated with typical anomalous summer rainfall patterns in China. J Geophys Res D Atmos. https://doi.org/10.1029/2004JD005413
Krebs FC, Gevorgyan SA, Alstrup J (2009) A roll-to-roll process to flexible polymer solar cells: model studies, manufacture and operational stability studies. J Mater Chem. https://doi.org/10.1039/b823001c
Tai Q, You P, Sang H et al (2016) Efficient and stable perovskite solar cells prepared in ambient air irrespective of the humidity. Nat Commun. https://doi.org/10.1038/ncomms11105
Jian Q, fei, Ma G qing, Qiu X liang, (2014) Influences of gas relative humidity on the temperature of membrane in PEMFC with interdigitated flow field. Renew Energy. https://doi.org/10.1016/j.renene.2013.06.046
Goh LJ, Othman MY, Mat S et al (2011) Review of heat pump systems for drying application. Renew Sustain Energy Rev 15(9):4788–4796
Wang D, Zhan Y, Yu T et al (2020) Improving meteorological input for surface energy balance system utilizing mesoscaleweather research and forecasting model for estimating daily actual evapotranspiration. Water (Switzerland). https://doi.org/10.3390/w12010009
Kaur A, Sharma JK, Agrawal S (2011) Artificial neural networks in forecasting maximum and minimum relative humidity. Int J Comput Sci Netw Secur 11:197–199
Pour SH, Wahab AKA, Shahid S (2020) Physical-empirical models for prediction of seasonal rainfall extremes of Peninsular Malaysia. Atmos Res 233:104720
Alley RB, Emanuel KA, Zhang F (2019) Advances in weather prediction. Science 363:342–344
Bauer P, Thorpe A, Brunet G (2015) The quiet revolution of numerical weather prediction. Nature 525(7567):47–55
Yaseen ZM, Sulaiman SO, Deo RC, Chau K-W (2018) An enhanced extreme learning machine model for river flow forecasting: state-of-the-art, practical applications in water resource engineering area and future research direction. J Hydrol 569:387–408. https://doi.org/10.1016/j.jhydrol.2018.11.069
Jing W, Yaseen ZM, Shahid S et al (2019) Implementation of evolutionary computing models for reference evapotranspiration modeling: short review, assessment and possible future research directions. Eng Appl Comput Fluid Mech 13:811–823. https://doi.org/10.1080/19942060.2019.1645045
Malik A, Kumar A, Salih SQ et al (2020) Drought index prediction using advanced fuzzy logic model: regional case study over Kumaon in India. PLoS ONE. https://doi.org/10.1371/journal.pone.0233280
Sanikhani H, Deo RC, Yaseen ZM et al (2018) Non-tuned data intelligent model for soil temperature estimation: a new approach. Geoderma 330:52–64. https://doi.org/10.1016/j.geoderma.2018.05.030
Sanikhani H, Deo RC, Samui P et al (2018) Survey of different data-intelligent modeling strategies for forecasting air temperature using geographic information as model predictors. Comput Electron Agric 152:242–260
Zhu S, Ptak M, Yaseen ZM et al (2020) Forecasting surface water temperature in lakes: a comparison of approaches. J Hydrol 585:124809
Jiang F, Wang K, Dong L et al (2019) Deep-learning-based joint resource scheduling algorithms for hybrid MEC networks. IEEE Internet Things J 7:6252–6265
Danandeh Mehr A, Nourani V, Kahya E et al (2018) Genetic programming in water resources engineering: a state-of-the-art review. J Hydrol 566:643–667
Jiang F, Wang K, Dong L et al (2020) AI driven heterogeneous MEC system with UAV assistance for dynamic environment: challenges and solutions. IEEE Netw 35:400–408
Yaseen ZM, Faris H, Al-Ansari N (2020) Hybridized extreme learning machine model with salp swarm algorithm: a novel predictive model for hydrological application. Complexity. https://doi.org/10.1155/2020/8206245
Bou-Fakhreddine B, Mougharbel I, Faye A et al (2018) Daily river flow prediction based on two-phase constructive fuzzy systems modeling: a case of hydrological—meteorological measurements asymmetry. J Hydrol 558:255–265. https://doi.org/10.1016/j.jhydrol.2018.01.035
Yaseen ZM, Ebtehaj I, Kim S et al (2019) Novel hybrid data-intelligence model for forecasting monthly rainfall with uncertainty analysis. Water (Switzerland). https://doi.org/10.3390/w11030502
Ahmed K, Sachindra DA, Shahid S et al (2020) Multi-model ensemble predictions of precipitation and temperature using machine learning algorithms. Atmos Res. https://doi.org/10.1016/j.atmosres.2019.104806
Ali M, Prasad R, Xiang Y, Yaseen ZM (2020) Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.124647
Deo RC, Samui P, Kim D (2015) Estimation of monthly evaporative loss using relevance vector machine, extreme learning machine and multivariate adaptive regression spline models. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-015-1153-y
Keshtegar B, Kisi O (2017) Modified response-surface method: new approach for modeling pan evaporation. J Hydrol Eng. https://doi.org/10.1061/(asce)he.1943-5584.0001541
Khan N, Shahid S, Juneng L et al (2019) Prediction of heat waves in Pakistan using quantile regression forests. Atmos Res 221:1–11. https://doi.org/10.1016/j.atmosres.2019.01.024
Ali M, Prasad R (2019) Significant wave height forecasting via an extreme learning machine model integrated with improved complete ensemble empirical mode decomposition. Renew Sustain Energy Rev. https://doi.org/10.1016/j.rser.2019.01.014
Yaseen ZM, Shahid S (2020) Drought index prediction using data intelligent analytic models: a review intelligent data analytics for decision-support systems in hazard mitigation. Springer, Singapore, pp 1–27
Khan N, Sachindra DA, Shahid S et al (2020) Prediction of droughts over Pakistan using machine learning algorithms. Adv Water Resour. https://doi.org/10.1016/j.advwatres.2020.103562
Tao H, Salih SQ, Saggi MK et al (2020) A newly developed integrative bio-inspired artificial intelligence model for wind speed prediction. IEEE Access 8:83347–83358
Bokde N, Feijóo A, Al-Ansari N et al (2020) The hybridization of ensemble empirical mode decomposition with forecasting models: Application of short-term wind speed and power modeling. Energies 13:1666
Sharafati A, Khosravi K, Khosravinia P et al (2019) The potential of novel data mining models for global solar radiation prediction. Int J Environ Sci Technol. https://doi.org/10.1007/s13762-019-02344-0
Kisi O, Heddam S, Yaseen ZM (2019) The implementation of univariable scheme-based air temperature for solar radiation prediction: new development of dynamic evolving neural-fuzzy inference system model. Appl Energy 241:184–195
AlSadi S, Khatib T (2012) Modeling of relative humidity using artificial neural network. J Asian Sci Res 2:81–86
Khatibi R, Naghipour L, Ghorbani MA, Aalami MT (2013) Predictability of relative humidity by two artificial intelligence techniques using noisy data from two Californian gauging stations. Neural Comput Appl 23:2241–2252. https://doi.org/10.1007/s00521-012-1175-z
Mba L, Meukam P, Kemajou A (2016) Application of artificial neural network for predicting hourly indoor air temperature and relative humidity in modern building in humid region. Energy Build 121:32–42. https://doi.org/10.1016/j.enbuild.2016.03.046
Philippopoulos K, Deligiorgi D, Kouroupetroglou G (2015) Artificial neural network modeling of relative humidity and air temperature spatial and temporal distributions over complex terrains. In Pattern Recognition Applications and Methods. Springer, Cham, pp 171–187
Bayatvarkeshi M, Mohammadi K, Kisi O, Fasihi R (2018) A new wavelet conjunction approach for estimation of relative humidity: wavelet principal component analysis combined with ANN. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3916-0
Lange H, Sippel S (2020) Machine learning applications in hydrology. In: Levia DF, Carlyle-Moses DE, Iida S, Michalzik B, Nanko K, Tischer A (eds), Forest-water interactions. Ecological Studies, 240. Cham, Switzerland: Springer Nature, pp 233–257. https://doi.org/10.1007/978-3-030-26086-6_10
Sit M, Demiray BZ, Xiang Z et al (2020) A comprehensive review of deep learning applications in hydrology and water resources. Water Sci Technol 82:2635–2670
Adnan RM, Liang Z, Heddam S et al (2019) Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J Hydrol. https://doi.org/10.1016/j.jhydrol.2019.124371
Heddam S, Kisi O (2018) Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and m5 model tree. J Hydrol. https://doi.org/10.1016/j.jhydrol.2018.02.061
Kisi O, Heddam S (2019) Evaporation modelling by heuristic regression approaches using only temperature data. Hydrol Sci J. https://doi.org/10.1080/02626667.2019.1599487
Heddam S, Ptak M, Zhu S (2020) Modelling of daily lake surface water temperature from air temperature: extremely randomized trees (ERT) versus Air2Water, MARS, M5Tree RF and MLPNN. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.125130
Ahmed K, Shahid S, Nawaz N, Khan N (2018) Modeling climate change impacts on precipitation in arid regions of Pakistan: a non-local model output statistics downscaling approach. Theor Appl Climatol. https://doi.org/10.1007/s00704-018-2672-5
Ahmed K, Iqbal Z, Khan N et al (2019) Quantitative assessment of precipitation changes under CMIP5 RCP scenarios over the northern sub-Himalayan region of Pakistan. Environ Dev Sustain. https://doi.org/10.1007/s10668-019-00548-5
Awadh SM, Abdulhussein FM, Al-Kilabi JA (2016) Hydrogeochemical processes and water-rock interaction of groundwater in Al-Dammam aquifer at Bahr Al-Najaf, Central Iraq. Iraqi Bull Geol Min 12:1–15
Abbasa N, Wasimia SA, Al-Ansari N (2016) Assessment of climate change impacts on water resources of Al-Adhaim, Iraq using SWAT model. Engineering 08:716–732. https://doi.org/10.4236/eng.2016.810065
Sayl KN, Muhammad NS, Yaseen ZM, El-shafie A (2016) Estimation the physical variables of rainwater harvesting system using integrated GIS-based remote sensing approach. Water Resour Manag 30:3299–3313. https://doi.org/10.1007/s11269-016-1350-6
Oleiwi S, Jalal S, Hamed S et al (2018) Precipitation pattern modeling using cross-station perception: regional investigation. Environ Earth Sci. https://doi.org/10.1007/s12665-018-7898-0
Cullen HM, DeMenocal PB (2000) North Atlantic influence on tigris-euphrates streamflow. Int J Climatol 20:853–863. https://doi.org/10.1002/1097-0088(20000630)20:8%3c853::AID-JOC497%3e3.0.CO;2-M
Osman Y, Abdellatif M, Al-Ansari N et al (2017) Climate change and future precipitation in arid environment of middle east: case study of Iraq. J Environ Hydrol 25:1–18
Khosravi K, Daggupati P, Alami MT et al (2019) Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: a case study in Iraq. Comput Electron Agric 167:105041
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67. https://doi.org/10.1214/aos/1176347963
Yousif AA, Sulaiman SO, Diop L et al (2019) Open channel sluice gate scouring parameters prediction: different scenarios of dimensional and non-dimensional input parameters. Water (Switzerland). https://doi.org/10.3390/w11020353
Yaseen ZM, Deo RC, Hilal A et al (2018) Predicting compressive strength of lightweight foamed concrete using extreme learning machine model. Adv Eng Softw 115:112–125. https://doi.org/10.1016/j.advengsoft.2017.09.004
Sekulic S, Kowalski BR (1992) Mars: a tutorial. J Chemom 6:199–216
Rehamnia I, Benlaoukli B, Heddam S (2020) Modeling of seepage flow through concrete face rockfill and embankment dams using three heuristic artificial intelligence approaches: a comparative study. Environ Process. https://doi.org/10.1007/s40710-019-00414-6
Sekhar Roy S, Roy R, Balas VE (2018) Estimating heating load in buildings using multivariate adaptive regression splines, extreme learning machine, a hybrid model of MARS and ELM. Renew Sustain Energy Rev 82:4256–4268. https://doi.org/10.1016/j.rser.2017.05.249
Al-Sudani ZA, Salih SQ, Yaseen ZM (2019) Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation. J Hydrol 573:1–12
Zhang B, Xu D, Liu Y et al (2016) Multi-scale evapotranspiration of summer maize and the controlling meteorological factors in north China. Agric For Meteorol. https://doi.org/10.1016/j.agrformet.2015.09.015
Ho TK (1995) Random decision forests C3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. IEEE Computer Society, Washington, D.C., pp 278–82
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Tang T, Liang Z, Hu Y et al (2020) Research on flood forecasting based on flood hydrograph generalization and random forest in Qiushui River basin, China. J Hydroinform 22:1588–1602
Raghavendra S, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput J 19:372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Ramedani Z, Omid M, Keyhani A et al (2014) A comparative study between fuzzy linear regression and support vector regression for global solar radiation prediction in Iran. Sol Energy 109:135–143. https://doi.org/10.1016/j.solener.2014.08.023
Brereton RG, Lloyd GR (2010) Support vector machines for classification and regression. Analyst 135:230–267. https://doi.org/10.1039/B918972F
Campbell C, Ying Y (2011) Learning with support vector machines. Synth Lect Artif Intell Mach Learn. https://doi.org/10.2200/S00324ED1V01Y201102AIM010
Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Advances in neural information processing systems, pp 281–287
Georganos S, Grippa T, Vanhuysse S et al (2018) Very high resolution object-based land use-land cover urban classification using extreme gradient boosting. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2018.2803259
Brownlee J (2018) Feature Importance and Feature Selection With XGBoost in Python. Machine Learning Mastery, 10-Mar-2018. [Online]. Available: https://machinelearningmastery.com/feature-importance-and-feature-selection-withxgboost-in-python/
Shi X, Wong YD, Li MZ-F et al (2019) A feature learning approach based on XGBoost for driving assessment and risk prediction. Accid Anal Prev 129:170–179
Abdullah AYM, Masrur A, Gani Adnan MS et al (2019) Spatio-temporal patterns of land use/land cover change in the heterogeneous coastal region of Bangladesh between 1990 and 2017. Remote Sens. https://doi.org/10.3390/rs11070790
Torres-Barrán A, Alonso Á, Dorronsoro JR (2017) Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.05.104
Zheng H, Yuan J, Chen L (2017) Short-term load forecasting using EMD-LSTM neural networks with a xgboost algorithm for feature importance evaluation. Energies. https://doi.org/10.3390/en10081168
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York
Breiman L, Friedman JH, Ohlsen RA, Stone CJ (1984) Classification and regression trees. The Wadsworth Statistics Probability Series. Boston: Wadsworth Publishing, p 358
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—a discussion of principles. J Hydrol 10:282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Legates DR, McCabe GJ (1999) Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241. https://doi.org/10.1029/1998WR900018
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106:7183–7192. https://doi.org/10.1029/2000JD900719
Acknowledgements
The authors acknowledge the data source provider: State Commission of Dams and Reservoirs, Ministry of Water Resources, Baghdad, Iraq. In addition, the authors acknowledge the supports received by the doctoral scientific research initial funding project of Baoji University of Arts and Sciences (ZK2018062) and the Key Research, Development Program in Shaanxi Province (2019GY-131).
Author information
Authors and Affiliations
Contributions
HT contributed to concept, modeling, software, writing. SMA contributed to writing, data, validation, investigation. SQS contributed to validation, discussion, analysis, and writing. SSS contributed to data analysis, revision, editing, validation, investigation. ZMY contributed to data analysis, revision, editing, validation, investigation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest to any party.
Consent to publish
The research is scientifically consent to be published.
Ethical approval
The manuscript is conducted within the ethical manner advised by the NCAA journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tao, H., Awadh, S.M., Salih, S.Q. et al. Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction. Neural Comput & Applic 34, 515–533 (2022). https://doi.org/10.1007/s00521-021-06362-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06362-3