Original papers
Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration

https://doi.org/10.1016/j.compag.2021.106039Get rights and content

Highlights

  • Stacking/blending models were first employed for daily ETo estimation.

  • Stacking/blending models were compared with basic and empirical models.

  • Stacking/blending models had better accuracy and portability across stations.

  • Stacking/blending models had higher accuracy when data or inputs were limited.

  • Blending models had similar accuracy to stacking models with less time costs.

Abstract

Precise reference evapotranspiration (ETo) estimation and prediction are the first steps to realize efficient agricultural water resources management. As machine learning methods are widely applied in ETo estimation, we assess whether a high accuracy can be attained by stacking or integrating more models. Can the accuracy be increased indefinitely and at what cost? To this end, this study reports the first evaluation of stacking and blending ensemble models for daily ETo estimation. The stacking and blending models adopted a 2-layer structure: level-0 basic models included random forest (RF), support vector regression (SVR), multilayer perceptron neural network (MLP) and K-Nearest Neighbor regression (KNN); level-1 outputted the final result via linear regression (LR). The accuracy and computational costs of stacking and blending models were compared with those of the 4 basic models and 3 empirical models under 5 complete and limited input conditions. A station-cross validation on models with solar radiation input was further performed to study the portability of the tested models. The results indicated that both stacking and blending models performed better than the basic and empirical models regardless of input combination, and the former (R2 ranged from 0.6602 to 0.9977, with an average AIC of −7785.68) achieved a slightly higher accuracy than the latter models (R2: 0.6562–0.9974; average AIC: −7689.68). Meanwhile, the stacking and blending models were more portable (RMSE ranged from 0.5445 to 0.8799 and 0.5511–0.8767 mm day−1, respectively) than basic models across stations in different climate zones. In terms of computational cost, both stacking and blending models were able to achieve significantly better accuracy than basic models in reasonable time with smaller training data size, while the blending models could obtain similar high accuracy to stacking models in less time after increasing the size of the training data. Therefore, the stacking and blending ensemble models can be highly recommend for ETo estimation, especially when the available training data set or meteorological variables are limited.

Introduction

As one of the most crucial factors in the global ecological water cycle and surface energy balance, evapotranspiration (ET) plays a very important role in the estimation of crop water requirements, development of irrigation schedule and efficient agricultural water management (Allen et al., 1998a, Fan et al., 2019, Feng et al., 2016, Tabari et al., 2013). Experimental methods such as the Bowen ratio energy balance and eddy covariance systems can measure ET value directly, however, tedious operations and special precautions limit the application of such methods under most conditions (Allen et al., 1989, Shih, 1984, Tabari et al., 2013). In practice, the ET value for a specific crop can be alternatively estimated by applying related crop coefficient (Kc) with reference evapotranspiration (ETo), which is calculated according to meteorological data under the standard underlying surface conditions. Currently, the FAO-56 Penman-Monteith (FAO-56 PM) method has obtained reliable ETo estimation results worldwide and has been regarded as one of the standards for evaluating other ETo estimation methods. The meteorological variables required for the FAO-56 PM to calculate ETo include solar radiation, maximum and minimum air temperature, relative humidity and wind speed at 2 m height (Allen, 1998). However, meteorological data with complete variables are usually not available in specific study areas due to the fact that meteorological stations or partial equipment are absent, which greatly limits the practical application of FAO-56 PM (Allen et al., 1998b, Jensen et al., 1997, Jensen and Allen, 2016).

Empirical methods with a suitable accuracy based on fewer meteorological variables have thus became an area of increased interest. A large number of empirical methods has been developed over the last 50 years by numerous scientists and specialists worldwide to estimate ETo (Alexandris et al., 2008, Allen, 1997, Annandale et al., 2002, Jensen and Haise, 1963, Kisi, 2013). Presently, the empirical methods for estimating ETo can be generally divided into radiation-, temperature- and mass transfer-based methods (Tabari et al., 2013). The Jensen-Haise model was derived from numerous measured evapotranspiration data with the estimated solar radiation in western United States (Jensen and Haise, 1963) and provided an dimensionless energy balance equation for ETo estimation. Based on a simplification of the Penman-Monteith method, Priestley and Taylor (Priestley and Taylor, 1972) established Priestley-Taylor model and adopted the maximum and minimum temperature and solar radiation to estimate ETo in areas with a low humidity. Other radiation-based models, e.g. Irmak (Irmak et al., 2003), Makkink (Makkink, 1957) and Turc (Turc, 1961), have also been widely implemented and recognized worldwide because the contributions of solar radiation and temperature to ETo dominate. Temperature-based methods require less and more accessible meteorological data but are correspondingly less accurate. Therefore, the extraterrestrial radiation (Ra) is commonly employed in such models to further improve the estimation accuracy. The Hargreaves-Samani model (Hargreaves and Samani, 1985) is thus based on the temperature data and Ra has already become the most widely applied empirical model as a result of its simple input and acceptable accuracy (Almorox et al., 2015). The mass transfer-based method, which is generally based on the theory of Dalton’s gas law (Shih, 1984), utilizes the concept of eddy transfer of water vapor from an evaporating surface to the atmosphere. However, previous studies (Djaman et al., 2015, Mehdizadeh et al., 2017, Shiri, 2018) have indicated that the estimation accuracy of mass transfer-based methods is usually worse than that of radiation- and temperature-based methods. Although these empirical methods have simplified the meteorological data requirements of ETo estimation, they are less capable on daily scale and further regression of the parameters involved in the equation is required to attain a higher accuracy pursuit (Feng et al., 2017b, Torres et al., 2011a).

Machine learning (ML) has emerged with big data technologies and high-performance computing to offer a new approach for ETo estimation and prediction (Granata, 2019). Based on its advantages of short computation time, high accuracy and notable portability, machine learning models have attained amazing achievements in this field. The common machine learning models applied for ETo estimation can be divided into the following categories: (1) Artificial Neural Networks(ANN) based (Abdullah et al., 2015, Feng et al., 2016, Goyal et al., 2014, Kim et al., 2012, Kumar et al., 2002); (2) Kernel algorithm based (Fan et al., 2018, Goyal et al., 2014, Kisi, 2015, Kişi and Cimen, 2009, Mehdizadeh et al., 2017); (3) Classification and regression tree (CART) based (Fan et al., 2018, Feng et al., 2017a, Lu et al., 2018, Wu and Fan, 2019); (4) Meta-heuristic algorithm based (Ghorbani et al., 2018, Kisi and Alizamir, 2018a, Malik et al., 2020, Malik et al., 2017a, Rahimi et al., 2012); (5) Hybrid algorithm based (Doğan, 2009, Ghorbani et al., 2018, Goyal et al., 2014, Mosavi and Edalatifar, 2018, Tao et al., 2018). According to the previous research, the extreme learning machine (ELM) based on ANN(Abdullah et al., 2015, Feng et al., 2016), support vector regression (SVR) based on kernel algorithm(Fan et al., 2018, Kişi and Çimen, 2009), random forest (RF) based on CART(Fang et al., 2018, Wu et al., 2020a) have achieved a satisfactory accuracy regardless of whether complete or limited meteorological variable input data are available within their own category. In addition, higher accuracy and better generalization ability have been obtained with machine learning models than those obtained with empirical methods under the same limited input variables (Fan et al., 2018, Wu and Fan, 2019).

To further improve the modelling accuracy, the ensemble learning methods have attracted the attention of many researchers and been widely implemented for the estimation of daily ETo values (Liakos et al., 2018, Saggi and Jain, 2019). The kernel idea of ensemble learning is to combine several basic models (weak learners) to produce a new model (strong learner) so as to minimize the deviation, reduce the variance or improve the prediction results (Freund et al., 1999). The common ensemble learning methods can be generally classified as bagging, boosting and stacking/blending (Zhou, 2012). The bagging method trains a series of independent basic models using different bootstrap samples which are obtained by subsampling the training data set with replacement and outputs the regression results by weighted averaging these basic models. The boosting method produces a final new model by emphasizing the mispredicted samples by the previous basic model in the next basic model and combining the basic models in series via weighted averaging (Zhou, 2012). The random forest (RF) model based on bagging concept (Ponraj and Vigneswaran, 2019), the gradient boosting decision tree (GBDT)(Wu et al., 2019), extreme gradient boosting (XGBoost) (Fan et al., 2018), light gradient boosting machine (Light GBM) (Fan et al., 2019), categorical boosting (CATBoost) (Huang et al., 2019) based on boosting concept have already achieved a satisfactory performance in ETo estimation. Compared with the well-known bagging and boosting methods and their applications, the stacking/blending method is not common nor popular, but it has been the most adopted method in Kaggle Machine learning competition for further model accuracy improvement. The stacking/blending method has the ability of combine the advantages of multiple basic models and has been proven to be superior in terms of intrusion detection (Syarif et al., 2012), short-term electricity consumption prediction(Divina et al., 2018) and automatic cataract detection and grading(Yang et al., 2016). However, as far as the current application of the ensemble models, there has been no further development to apply stacking and blending methods for estimating daily ETo. Is it possible to achieve a higher ETo estimation accuracy by stacking several basic models? Can the accuracy be infinitely improved by stacking or blending more models? Therefore, to further study the performance of stacking/blending method in ETo modelling, this study investigated 5 stacking and blending models with complete and limited input variables for estimating daily ETo at 5 stations across 5 typical climate zone in China. The estimation results and computational cost were compared with every single well-trained basic model involved in the stacking or blending model and empirical models to comprehensively evaluate the performance of stacking and blending models.

Section snippets

Study area description

According to the different annual precipitation, accumulated temperature and altitude, the climate zones of China are generally divided into 5 typical types (Fan et al., 2016), including temperate monsoon zone (TMZ), temperate continental zone (TCZ), subtropical monsoon zone (SMZ), mountain plateau zone (MPZ) and tropical monsoon zone (TPMZ). The annual average pan evaporation of the above 5 climate zones are 1475, 2148, 1545, 1883 and 1175 mm, respectively. Due to the clear evaporation

Comparison of the performance of different models on daily scale

Table 6, Table 7, Table 8, Table 9, Table 10 summarize the performance of the basic models, blending and stacking models with different input meteorological variables at Taiyuan, Urumqi, Wuhan, Lhasa and Guangzhou stations respectively. Fig. 4 shows a scatter diagram comparing the ETo value estimated by the tested models to those estimated by FAO-56 PM model at 5 stations.

As shown in Table6, models with complete meteorological variable input (M1) attained the best performance over all input

Conclusion

This study first introduced stacking and blending ensemble models for daily ETo estimation and the results were compared with those of four basic models (RF, SVR, MLP and KNN) included in the ensemble models and three empirical models (FAO48-PM, MK and HS) under complete and limited input conditions. The results indicated that the ETo estimation accuracy of both stacking and blending models was higher than that of the basic and empirical models regardless of the input combinations and the

CRediT authorship contribution statement

Tianao Wu: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft, Writing - review & editing. Wei Zhang: Data curation, Writing - original draft. Xiyun Jiao: Supervision, Validation. Weihua Guo: Validation, Visualization, Funding acquisition. Yousef Alhaj Hamoud: Formal analysis, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study is financially supported by National Natural Science Foundation of China (No: 51609064) and the Fundamental Research Funds for the Central Universities (B19020185). We sincerely thank the National Climatic Centre of the China Meteorological Administration for providing the daily meteorological database used in this study.

References (74)

  • O. Kisi et al.

    Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: Wavelet extreme learning machine vs wavelet neural networks

    Agric. For. Meteorol.

    (2018)
  • G. Landeras et al.

    Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain)

    Agric. water Manag.

    (2008)
  • X. Lu et al.

    Daily pan evaporation modeling from local and cross-station data using three tree-based machine learning models

    J. Hydrol.

    (2018)
  • A. Malik et al.

    Monthly pan-evaporation estimation in Indian central Himalayas using different heuristic approaches and climate based models

    Comput. Electron. Agric.

    (2017)
  • A. Malik et al.

    Daily suspended sediment concentration simulation using hydrological data of Pranhita River Basin, India

    Comput. Electron. Agric.

    (2017)
  • S. Mehdizadeh et al.

    Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration

    Comput. Electron. Agric.

    (2017)
  • I. Rahimi et al.

    Calibration of Angstrom equation for estimating solar radiation using meta-heuristic harmony search algorithm (case study: Mashhad-East of Iran)

    Energy Procedia

    (2012)
  • J. Shiri

    Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology

    J. Hydrol.

    (2018)
  • H. Tao et al.

    Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso

    Agric. water Manag.

    (2018)
  • A.F. Torres et al.

    Forecasting daily potential evapotranspiration using machine learning and limited climatic data

    Agric. Water Manag.

    (2011)
  • Akaike, H., 1974. A New Look at the Statistical Model Identification. IEEE Trans. Automat. Contr....
  • S. Alexandris et al.

    Comparative analysis of reference evapotranspiration from the surface of rainfed grass in central Serbia, calculated by six empirical methods against the Penman-Monteith formula

    Eur. Water

    (2008)
  • Allen

    Self-calibrating method for estimating solar radiation from air temperature

    J. Hydrol. Eng.

    (1997)
  • Allen, Jensen, M.E., Wright, J.L., Burman, R.D., 1989. Operational estimates of reference evapotranspiration. Agron. J....
  • Allen, Pereira, L.S., Raes, D., Smith, M., 1998a. Crop evapotranspiration: Guidelines for computing crop requirements....
  • Allen, Pereira, L.S., Raes, D., Smith, M., 1998b. Crop evapotranspiration-Guidelines for computing crop water...
  • Allen, R., Pereira, L., Raes, D., Smith, M., 1998. Guidelines for computing crop water requirements-FAO Irrigation and...
  • J. Almorox et al.

    Global performance ranking of temperature-based approaches for evapotranspiration estimation considering Köppen climate classes

    J. Hydrol.

    (2015)
  • J. Annandale et al.

    Software for missing data error analysis of Penman-Monteith reference evapotranspiration

    Irrig. Sci.

    (2002)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • T. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. theory

    (1967)
  • H.A.R. De Bruin et al.

    Reference crop evapotranspiration determined with a modified Makkink equation

    Hydrol. Process.

    (1998)
  • F. Divina et al.

    Stacking ensemble learning for short-term electricity consumption forecasting

    Energies

    (2018)
  • E. Doğan

    Reference evapotranspiration estimation using adaptive neuro-fuzzy inference systems

    Irrig. Drain. J. Int. Comm. Irrig. Drain.

    (2009)
  • J. Fan et al.

    Climate change effects on reference crop evapotranspiration across different climatic zones of China during 1956–2015

    J. Hydrol.

    (2016)
  • W. Fang et al.

    Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information

    J. Hydrol.

    (2018)
  • Y. Feng et al.

    Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data

    Comput. Electron. Agric.

    (2017)
  • Cited by (86)

    View all citing articles on Scopus
    View full text