Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration

doi:10.1016/j.compag.2021.106039

Computers and Electronics in Agriculture

Volume 184, May 2021, 106039

https://doi.org/10.1016/j.compag.2021.106039 Get rights and content

Highlights

•
Stacking/blending models were first employed for daily ETo estimation.
•
Stacking/blending models were compared with basic and empirical models.
•
Stacking/blending models had better accuracy and portability across stations.
•
Stacking/blending models had higher accuracy when data or inputs were limited.
•
Blending models had similar accuracy to stacking models with less time costs.

Abstract

Precise reference evapotranspiration (ETo) estimation and prediction are the first steps to realize efficient agricultural water resources management. As machine learning methods are widely applied in ETo estimation, we assess whether a high accuracy can be attained by stacking or integrating more models. Can the accuracy be increased indefinitely and at what cost? To this end, this study reports the first evaluation of stacking and blending ensemble models for daily ETo estimation. The stacking and blending models adopted a 2-layer structure: level-0 basic models included random forest (RF), support vector regression (SVR), multilayer perceptron neural network (MLP) and K-Nearest Neighbor regression (KNN); level-1 outputted the final result via linear regression (LR). The accuracy and computational costs of stacking and blending models were compared with those of the 4 basic models and 3 empirical models under 5 complete and limited input conditions. A station-cross validation on models with solar radiation input was further performed to study the portability of the tested models. The results indicated that both stacking and blending models performed better than the basic and empirical models regardless of input combination, and the former (R² ranged from 0.6602 to 0.9977, with an average AIC of −7785.68) achieved a slightly higher accuracy than the latter models (R²: 0.6562–0.9974; average AIC: −7689.68). Meanwhile, the stacking and blending models were more portable (RMSE ranged from 0.5445 to 0.8799 and 0.5511–0.8767 mm day⁻¹, respectively) than basic models across stations in different climate zones. In terms of computational cost, both stacking and blending models were able to achieve significantly better accuracy than basic models in reasonable time with smaller training data size, while the blending models could obtain similar high accuracy to stacking models in less time after increasing the size of the training data. Therefore, the stacking and blending ensemble models can be highly recommend for ETo estimation, especially when the available training data set or meteorological variables are limited.

Introduction

As one of the most crucial factors in the global ecological water cycle and surface energy balance, evapotranspiration (ET) plays a very important role in the estimation of crop water requirements, development of irrigation schedule and efficient agricultural water management (Allen et al., 1998a, Fan et al., 2019, Feng et al., 2016, Tabari et al., 2013). Experimental methods such as the Bowen ratio energy balance and eddy covariance systems can measure ET value directly, however, tedious operations and special precautions limit the application of such methods under most conditions (Allen et al., 1989, Shih, 1984, Tabari et al., 2013). In practice, the ET value for a specific crop can be alternatively estimated by applying related crop coefficient (K_c) with reference evapotranspiration (ETo), which is calculated according to meteorological data under the standard underlying surface conditions. Currently, the FAO-56 Penman-Monteith (FAO-56 PM) method has obtained reliable ETo estimation results worldwide and has been regarded as one of the standards for evaluating other ETo estimation methods. The meteorological variables required for the FAO-56 PM to calculate ETo include solar radiation, maximum and minimum air temperature, relative humidity and wind speed at 2 m height (Allen, 1998). However, meteorological data with complete variables are usually not available in specific study areas due to the fact that meteorological stations or partial equipment are absent, which greatly limits the practical application of FAO-56 PM (Allen et al., 1998b, Jensen et al., 1997, Jensen and Allen, 2016).

Empirical methods with a suitable accuracy based on fewer meteorological variables have thus became an area of increased interest. A large number of empirical methods has been developed over the last 50 years by numerous scientists and specialists worldwide to estimate ETo (Alexandris et al., 2008, Allen, 1997, Annandale et al., 2002, Jensen and Haise, 1963, Kisi, 2013). Presently, the empirical methods for estimating ETo can be generally divided into radiation-, temperature- and mass transfer-based methods (Tabari et al., 2013). The Jensen-Haise model was derived from numerous measured evapotranspiration data with the estimated solar radiation in western United States (Jensen and Haise, 1963) and provided an dimensionless energy balance equation for ETo estimation. Based on a simplification of the Penman-Monteith method, Priestley and Taylor (Priestley and Taylor, 1972) established Priestley-Taylor model and adopted the maximum and minimum temperature and solar radiation to estimate ETo in areas with a low humidity. Other radiation-based models, e.g. Irmak (Irmak et al., 2003), Makkink (Makkink, 1957) and Turc (Turc, 1961), have also been widely implemented and recognized worldwide because the contributions of solar radiation and temperature to ETo dominate. Temperature-based methods require less and more accessible meteorological data but are correspondingly less accurate. Therefore, the extraterrestrial radiation (Ra) is commonly employed in such models to further improve the estimation accuracy. The Hargreaves-Samani model (Hargreaves and Samani, 1985) is thus based on the temperature data and Ra has already become the most widely applied empirical model as a result of its simple input and acceptable accuracy (Almorox et al., 2015). The mass transfer-based method, which is generally based on the theory of Dalton’s gas law (Shih, 1984), utilizes the concept of eddy transfer of water vapor from an evaporating surface to the atmosphere. However, previous studies (Djaman et al., 2015, Mehdizadeh et al., 2017, Shiri, 2018) have indicated that the estimation accuracy of mass transfer-based methods is usually worse than that of radiation- and temperature-based methods. Although these empirical methods have simplified the meteorological data requirements of ETo estimation, they are less capable on daily scale and further regression of the parameters involved in the equation is required to attain a higher accuracy pursuit (Feng et al., 2017b, Torres et al., 2011a).

Machine learning (ML) has emerged with big data technologies and high-performance computing to offer a new approach for ETo estimation and prediction (Granata, 2019). Based on its advantages of short computation time, high accuracy and notable portability, machine learning models have attained amazing achievements in this field. The common machine learning models applied for ETo estimation can be divided into the following categories: (1) Artificial Neural Networks(ANN) based (Abdullah et al., 2015, Feng et al., 2016, Goyal et al., 2014, Kim et al., 2012, Kumar et al., 2002); (2) Kernel algorithm based (Fan et al., 2018, Goyal et al., 2014, Kisi, 2015, Kişi and Cimen, 2009, Mehdizadeh et al., 2017); (3) Classification and regression tree (CART) based (Fan et al., 2018, Feng et al., 2017a, Lu et al., 2018, Wu and Fan, 2019); (4) Meta-heuristic algorithm based (Ghorbani et al., 2018, Kisi and Alizamir, 2018a, Malik et al., 2020, Malik et al., 2017a, Rahimi et al., 2012); (5) Hybrid algorithm based (Doğan, 2009, Ghorbani et al., 2018, Goyal et al., 2014, Mosavi and Edalatifar, 2018, Tao et al., 2018). According to the previous research, the extreme learning machine (ELM) based on ANN(Abdullah et al., 2015, Feng et al., 2016), support vector regression (SVR) based on kernel algorithm(Fan et al., 2018, Kişi and Çimen, 2009), random forest (RF) based on CART(Fang et al., 2018, Wu et al., 2020a) have achieved a satisfactory accuracy regardless of whether complete or limited meteorological variable input data are available within their own category. In addition, higher accuracy and better generalization ability have been obtained with machine learning models than those obtained with empirical methods under the same limited input variables (Fan et al., 2018, Wu and Fan, 2019).

To further improve the modelling accuracy, the ensemble learning methods have attracted the attention of many researchers and been widely implemented for the estimation of daily ETo values (Liakos et al., 2018, Saggi and Jain, 2019). The kernel idea of ensemble learning is to combine several basic models (weak learners) to produce a new model (strong learner) so as to minimize the deviation, reduce the variance or improve the prediction results (Freund et al., 1999). The common ensemble learning methods can be generally classified as bagging, boosting and stacking/blending (Zhou, 2012). The bagging method trains a series of independent basic models using different bootstrap samples which are obtained by subsampling the training data set with replacement and outputs the regression results by weighted averaging these basic models. The boosting method produces a final new model by emphasizing the mispredicted samples by the previous basic model in the next basic model and combining the basic models in series via weighted averaging (Zhou, 2012). The random forest (RF) model based on bagging concept (Ponraj and Vigneswaran, 2019), the gradient boosting decision tree (GBDT)(Wu et al., 2019), extreme gradient boosting (XGBoost) (Fan et al., 2018), light gradient boosting machine (Light GBM) (Fan et al., 2019), categorical boosting (CATBoost) (Huang et al., 2019) based on boosting concept have already achieved a satisfactory performance in ETo estimation. Compared with the well-known bagging and boosting methods and their applications, the stacking/blending method is not common nor popular, but it has been the most adopted method in Kaggle Machine learning competition for further model accuracy improvement. The stacking/blending method has the ability of combine the advantages of multiple basic models and has been proven to be superior in terms of intrusion detection (Syarif et al., 2012), short-term electricity consumption prediction(Divina et al., 2018) and automatic cataract detection and grading(Yang et al., 2016). However, as far as the current application of the ensemble models, there has been no further development to apply stacking and blending methods for estimating daily ETo. Is it possible to achieve a higher ETo estimation accuracy by stacking several basic models? Can the accuracy be infinitely improved by stacking or blending more models? Therefore, to further study the performance of stacking/blending method in ETo modelling, this study investigated 5 stacking and blending models with complete and limited input variables for estimating daily ETo at 5 stations across 5 typical climate zone in China. The estimation results and computational cost were compared with every single well-trained basic model involved in the stacking or blending model and empirical models to comprehensively evaluate the performance of stacking and blending models.

Section snippets

Study area description

According to the different annual precipitation, accumulated temperature and altitude, the climate zones of China are generally divided into 5 typical types (Fan et al., 2016), including temperate monsoon zone (TMZ), temperate continental zone (TCZ), subtropical monsoon zone (SMZ), mountain plateau zone (MPZ) and tropical monsoon zone (TPMZ). The annual average pan evaporation of the above 5 climate zones are 1475, 2148, 1545, 1883 and 1175 mm, respectively. Due to the clear evaporation

Comparison of the performance of different models on daily scale

Table 6, Table 7, Table 8, Table 9, Table 10 summarize the performance of the basic models, blending and stacking models with different input meteorological variables at Taiyuan, Urumqi, Wuhan, Lhasa and Guangzhou stations respectively. Fig. 4 shows a scatter diagram comparing the ETo value estimated by the tested models to those estimated by FAO-56 PM model at 5 stations.

As shown in Table6, models with complete meteorological variable input (M1) attained the best performance over all input

Conclusion

This study first introduced stacking and blending ensemble models for daily ETo estimation and the results were compared with those of four basic models (RF, SVR, MLP and KNN) included in the ensemble models and three empirical models (FAO48-PM, MK and HS) under complete and limited input conditions. The results indicated that the ETo estimation accuracy of both stacking and blending models was higher than that of the basic and empirical models regardless of the input combinations and the

CRediT authorship contribution statement

Tianao Wu: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft, Writing - review & editing. Wei Zhang: Data curation, Writing - original draft. Xiyun Jiao: Supervision, Validation. Weihua Guo: Validation, Visualization, Funding acquisition. Yousef Alhaj Hamoud: Formal analysis, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study is financially supported by National Natural Science Foundation of China (No: 51609064) and the Fundamental Research Funds for the Central Universities (B19020185). We sincerely thank the National Climatic Centre of the China Meteorological Administration for providing the daily meteorological database used in this study.

References (74)

S.S. Abdullah et al.
Extreme Learning Machines: A new approach for prediction of reference evapotranspiration
J. Hydrol.
(2015)
A.A. Aburomman et al.
A novel SVM-kNN-PSO ensemble method for intrusion detection system
Appl. Soft Comput.
(2016)
K. Djaman et al.
Evaluation of sixteen reference evapotranspiration methods under sahelian conditions in the Senegal River Valley
J. Hydrol. Reg. Stud.
(2015)
J. Fan et al.
Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data
Agric. Water Manag.
(2019)
J. Fan et al.
Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China
Agric. For. Meteorol.
(2018)
Y. Feng et al.
Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling
Agric. Water Manag.
(2017)
Y. Feng et al.
Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China
J. Hydrol.
(2016)
M.K. Goyal et al.
Modeling of daily pan evaporation in sub tropical climates using ANN, LS-SVR, Fuzzy Logic, and ANFIS
Expert Syst. Appl.
(2014)
G. Huang et al.
Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions
J. Hydrol.
(2019)
O. Kisi
Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree
J. Hydrol.
(2015)

O. Kisi et al.

Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: Wavelet extreme learning machine vs wavelet neural networks

Agric. For. Meteorol.

(2018)

G. Landeras et al.

Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain)

Agric. water Manag.

(2008)

X. Lu et al.

Daily pan evaporation modeling from local and cross-station data using three tree-based machine learning models

J. Hydrol.

(2018)

A. Malik et al.

Monthly pan-evaporation estimation in Indian central Himalayas using different heuristic approaches and climate based models

Comput. Electron. Agric.

(2017)

A. Malik et al.

Daily suspended sediment concentration simulation using hydrological data of Pranhita River Basin, India

Comput. Electron. Agric.

(2017)

S. Mehdizadeh et al.

Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration

Comput. Electron. Agric.

(2017)

I. Rahimi et al.

Calibration of Angstrom equation for estimating solar radiation using meta-heuristic harmony search algorithm (case study: Mashhad-East of Iran)

Energy Procedia

(2012)

J. Shiri

Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology

J. Hydrol.

(2018)

H. Tao et al.

Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso

Agric. water Manag.

(2018)

A.F. Torres et al.

Forecasting daily potential evapotranspiration using machine learning and limited climatic data

Agric. Water Manag.

(2011)

Akaike, H., 1974. A New Look at the Statistical Model Identification. IEEE Trans. Automat. Contr....

S. Alexandris et al.

Comparative analysis of reference evapotranspiration from the surface of rainfed grass in central Serbia, calculated by six empirical methods against the Penman-Monteith formula

Eur. Water

(2008)

Allen

Self-calibrating method for estimating solar radiation from air temperature

J. Hydrol. Eng.

(1997)

Allen, Jensen, M.E., Wright, J.L., Burman, R.D., 1989. Operational estimates of reference evapotranspiration. Agron. J....

Allen, Pereira, L.S., Raes, D., Smith, M., 1998a. Crop evapotranspiration: Guidelines for computing crop requirements....

Allen, Pereira, L.S., Raes, D., Smith, M., 1998b. Crop evapotranspiration-Guidelines for computing crop water...

Allen, R., Pereira, L., Raes, D., Smith, M., 1998. Guidelines for computing crop water requirements-FAO Irrigation and...

J. Almorox et al.

Global performance ranking of temperature-based approaches for evapotranspiration estimation considering Köppen climate classes

J. Hydrol.

(2015)

J. Annandale et al.

Software for missing data error analysis of Penman-Monteith reference evapotranspiration

Irrig. Sci.

(2002)

L. Breiman

Random forests

Mach. Learn.

(2001)

T. Cover et al.

Nearest neighbor pattern classification

IEEE Trans. Inf. theory

(1967)

H.A.R. De Bruin et al.

Reference crop evapotranspiration determined with a modified Makkink equation

Hydrol. Process.

(1998)

F. Divina et al.

Stacking ensemble learning for short-term electricity consumption forecasting

Energies

(2018)

E. Doğan

Reference evapotranspiration estimation using adaptive neuro-fuzzy inference systems

Irrig. Drain. J. Int. Comm. Irrig. Drain.

(2009)

J. Fan et al.

Climate change effects on reference crop evapotranspiration across different climatic zones of China during 1956–2015

J. Hydrol.

(2016)

W. Fang et al.

Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information

J. Hydrol.

(2018)

Y. Feng et al.

Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data

Comput. Electron. Agric.

(2017)

Cited by (86)

A novel hybrid model combined with ensemble embedded feature selection method for estimating reference evapotranspiration in the North China Plain
2024, Agricultural Water Management
The reference evapotranspiration (ETo) is a key parameter in achieving sustainable use of agricultural water resources. To accurately acquire ETo under limited conditions, this study combined the northern goshawk optimization algorithm (NGO) with the extreme gradient boosting (XGBoost) model to propose a novel NGO-XGBoost model. The performance of this model was evaluated using meteorological data from 30 stations in the North China Plain and compared with XGBoost, random forest (RF), and k nearest neighbor (KNN) models. An ensemble embedded feature selection (EEFS) method combined with the results from RF, XGBoost, adaptive boosting (AdaBoost), and categorical boosting (CatBoost) models is used to obtain the importance of meteorological factors in estimating ETo, and thereby determine the optimal combination of inputs to the model. The results indicated that by using the top 3, 4, and 5 important factors as input combinations, all models achieved high ETo estimation accuracy. It is worth noting that there were significant spatial differences in the estimation precisions of the four models, but the NGO-XGBoost model exhibited consistently high estimation precisions, with global performance indicator (GPI) rankings of 1st, and the range of coefficient of determination (R²), nash efficiency coefficient (NSE), root mean square error (RMSE), mean absolute error (MAE) and mean bias error (MBE) were 0.920–0.998, 0.902–0.998, 0.078–0.623 mm d⁻¹, 0.058–0.430 mm d⁻¹, and −0.254–0.062 mm d⁻¹, respectively. Furthermore, the accuracy of the NGO-XGBoost model in estimating ETo varied across different seasons, which was more significantly affected by humidity and wind speed in winter. When the target station data was insufficient, the NGO-XGBoost model was trained by using the historical data from neighboring stations and still maintained a high precision. Overall, this study recommends a reliable method for estimating ETo, which provides a reference for accurately calculating ETo in the North China Plain in the absence of meteorological data.
Estimating leaf photosynthetic capacity using hyperspectral reflectance: Model variability and transferability
2024, Computers and Electronics in Agriculture
Leaf photosynthetic capacity is a crucial parameter for characterizing plant growth status and global nitrogen-carbon cycles. While leaf trait and reflectance models have been widely applied to assess leaf photosynthetic capacity across various plant species, the variability and transferability of these estimation models remain unclear. Thus, this study investigated the variability in estimating leaf maximum carboxylation rate of Rubisco (V_cmax, μmol m⁻² s⁻¹) using hyperspectral reflectance across seven plant species datasets. To improve model transferability, we proposed model updating by adding new samples and developed a stacking model to integrate multiple regression model results to reduce variability in the predictions. The PROSPECT model, coupled with spectral derivatives and similarity metrics, was used to retrieve leaf structural and biochemical traits. Our results showed that V_cmax was significantly correlated with the contents of leaf chlorophyll (C_ab) and protein (Prot), and other traits such as leaf structure, carotenoid content, and water content also influenced V_cmax. However, the strength of these correlations varied among different datasets due to differences in vegetation types, growth periods, and the number of species. Leaf trait relationships also varied with datasets, with C_ab proving to be a good proxy of photosynthesis across all datasets. Leaf traits were superior to leaf reflectance in characterizing the differences between datasets. While leaf reflectance performed well in estimating V_cmax for most datasets, leaf traits were more suitable for constructing transferable estimation models of V_cmax between different datasets. Model transferability was affected by differences in datasets, such as data range and plant species. Model updating by adding 10% new samples significantly improved the assessment of V_cmax, with leaf reflectance yielding better estimation results. Our data also revealed that different models produced inconsistent results, and a stacking model combining multiple models optimized the estimation of V_cmax using leaf traits and reflectance, with the cross-validation coefficient of determination and relative root mean square error of 0.88 and 19.92%, respectively. These findings offer new methods and ideas for assessing leaf photosynthetic capacity across different agro-ecosystems. Further recalibration of the proposed model with canopy radiative transfer models and datasets would enable monitoring of plant photosynthesis at large scales.
Development of objective function-based ensemble model for streamflow forecasts
2024, Journal of Hydrology
The objective function plays an important role in hydrological model calibrations/training, since it largely determines the values of the model parameters and consequently influences the model performance. In this study, we establish two application-orientated objective functions, namely high flow balance error (HFBE) and mean squared percentage error (MSPE), for the forecasts of high flows and low flows, respectively. We examine the strengths and weaknesses of these streamflow forecast models trained with HFBE, MSPE and mean square error (MSE). Furthermore, we develop an objective function-based ensemble model (OEM) framework that can integrate the models trained with different objective functions. Our results in 273 catchments over USA show that the models trained on MSE have obvious underestimation in high-flow prediction. The models trained on HFBE can alleviate this underestimation and thus perform remarkably better for high-flow forecast. In addition, the models trained on MSPE outperform the other two models in low-flow forecast, but with an expense of the deterioration in the forecasting performance for high-flow. By incorporating the models trained on HFBE, MSPE and MSE, our proposed OEM performs well under all streamflow levels, with a median KGE of 0.96 and a median logNSE of 0.95. OEM realistically captures the mean and the variability of the observational streamflow under different scenarios with a variety of hydrometeorological conditions. This study highlights the necessity of applying objective functions that are appropriate for the modeling goal and the potential of ensemble learning methods for multi-objective optimization in hydrological modeling.
Enhancing SWAP simulation accuracy via assimilation of leaf area index and soil moisture under different irrigation, film mulching and maize varieties conditions
2024, Computers and Electronics in Agriculture
Crop models are effective tools for guiding irrigation, predicting crop yield and enhancing water productivity. However diverse agricultural regions pose challenges in simulation accuracy due to variability in irrigation, agronomic practices, and crop types. Data assimilation offers an effective approach to reduce these uncertainties. In this study, we explored the performance of four data assimilation methods when integrated with the Soil-Water-Atmosphere-Plant (SWAP) model for simulating scenarios involving varying film mulching, irrigation levels and maize varieties. The data assimilation methods included global search (GS), Ensemble Kalman Filter (EnKF) for leaf area index (LAI) (EnKF-L), EnKF for soil water content (SWC) (EnKF-S), as well as EnKF for both LAI and SWC (EnKF-LS). To establish a baseline reference, we utilized the SWAP simulation without data assimilation method (OP) and compared its performance with the four data assimilation methods. To assess their performance, field experiments were conducted in the Shiyang River Basin of Northwest China during 2021 and 2022, with two varieties of spring maize subjected to two drip irrigation levels and two film mulching conditions. The treatment with full irrigation and no mulching in 2022 was employed to calibrate the SWAP model, while the other treatments were used for validation. The results showed that OP method failed to accurately capture the effects of film mulching and deficit irrigation on soil water and heat transport, crop growth, and meteorological parameters. The performance of the GS method was unstable, with a large range of root mean square error (RMSE) for maize growth and soil water content and temperature simulation. In contrast, the EnKF method significantly improved simulation accuracy, particularly when both LAI and SWC observations were assimilated simultaneously. The EnKF-LS methods reduced the root mean square error of the simulations by 0.04 to 1.57 m2/m⁻²(−|-) for LAI, 0.24 to 4.74 t hm⁻² for aboveground dry biomass, 0 to 0.071 m3/m⁻³(−|-) for SWC, 0 to 2.74 °C for soil temperature, 2.20 to 6.91 t hm⁻² for yield and 36.84 to 98.16 mm for evapotranspiration, both in comparison with the GS and OP methods. In terms of computational cost, the runtime of the three EnKF methods depends on the number of observations, whereas, over the entire simulation period of the SWAP model, the time required for EnKF methods was significantly less than that for the GS method. We recommend combining EnKF-LS with the SWAP model to enhance the awareness of crop and soil conditions for agricultural decision support.
Application of multi-algorithm ensemble methods in high-dimensional and small-sample data of geotechnical engineering: A case study of swelling pressure of expansive soils
2024, Journal of Rock Mechanics and Geotechnical Engineering
Geotechnical engineering data are usually small-sample and high-dimensional, which brings a lot of challenges in predictive modeling. This paper uses a typical high-dimensional and small-sample swell pressure (P_s) dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction. Based on six machine learning (ML) algorithms, the base learner pool is constructed, and four ensemble methods, Stacking (SG), Blending (BG), Voting regression (VR), and Feature weight linear stacking (FWL), are used for the multi-algorithm ensemble. Furthermore, the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling. The results show that the proposed methods are superior to traditional prediction models and base ML models, where FWL is more suitable for modeling with small-sample datasets, and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect, which points the way to feature selection for predictive modeling. Based on the ensemble methods, the feature importance of the five primary factors affecting P_s is the maximum dry density (31.145%), clay fraction (15.876%), swell percent (15.289%), plasticity index (14%), and optimum moisture content (13.69%), the influence of input parameters on P_s is also investigated, in line with the findings of the existing literature.
Ensemble learning-based estimation of reference evapotranspiration (ET<inf>o</inf>)
2023, Internet of Things (Netherlands)
Reference Evapotranspiration (ET $_{o}$ ) is crucial and influential in irrigation water management. Precise ET $_{o}$ rate estimation is vital for successful agriculture water management. There are numerous techniques for ET $_{o}$ rate simulation, but machine learning (ML) and deep learning (DL) approaches are currently popular. This study proposes an ensemble learning-based model for ET $_{o}$ rate estimation. The proposed model leverages minimum meteorological parameters, i.e., minimum temperature ( $T_{m i n}$ ), maximum temperature ( $T_{m a x}$ ), relative humidity ( $R H$ ), and mean wind speed ( $W S$ ) as input features. The proposed model employs Random Forest Bagging and Gradient Boosting models as bagging and boosting ensemble techniques for the accurate ET $_{o}$ rate estimation. The 10-fold cross-validation method is leveraged for the evaluation of the proposed model. The performance results of the proposed model are compared with the baseline model of ET $_{o}$ estimation, i.e., the Food and Agriculture Organization Penman–Monteith (FAO-56 PM) and off-the-shelf deep learning models. The performance results indicate that Random Forest Bagging is significant as it yields Gradient Boosting and baseline models with 93.15% f-measure and reduces Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) by 17% and 10%, respectively.

View all citing articles on Scopus

View full text

Original papersEvaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration

Highlights

Abstract

Introduction

Section snippets

Study area description

Comparison of the performance of different models on daily scale

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

J. Hydrol.

Appl. Soft Comput.

J. Hydrol. Reg. Stud.

Agric. Water Manag.

Agric. For. Meteorol.

Agric. Water Manag.

J. Hydrol.

Expert Syst. Appl.

J. Hydrol.

J. Hydrol.

Agric. For. Meteorol.

Agric. water Manag.

J. Hydrol.

Comput. Electron. Agric.

Comput. Electron. Agric.

Comput. Electron. Agric.

Energy Procedia

J. Hydrol.

Agric. water Manag.

Agric. Water Manag.

Comparative analysis of reference evapotranspiration from the surface of rainfed grass in central Serbia, calculated by six empirical methods against the Penman-Monteith formula

Eur. Water

Self-calibrating method for estimating solar radiation from air temperature

J. Hydrol. Eng.

Global performance ranking of temperature-based approaches for evapotranspiration estimation considering Köppen climate classes

J. Hydrol.

Software for missing data error analysis of Penman-Monteith reference evapotranspiration

Irrig. Sci.

Random forests

Mach. Learn.

Nearest neighbor pattern classification

IEEE Trans. Inf. theory

Reference crop evapotranspiration determined with a modified Makkink equation

Hydrol. Process.

Stacking ensemble learning for short-term electricity consumption forecasting

Energies

Reference evapotranspiration estimation using adaptive neuro-fuzzy inference systems

Irrig. Drain. J. Int. Comm. Irrig. Drain.

Climate change effects on reference crop evapotranspiration across different climatic zones of China during 1956–2015

J. Hydrol.

Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information

J. Hydrol.

Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data

Comput. Electron. Agric.

Original papers
Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration