Field-scale rice yield prediction from Sentinel-2 monthly image composites using machine learning algorithms
Introduction
Rice agriculture plays an important role in Taiwan’'s rural economy and culture. This crop directly feeds more than 23.5 million people (Andoko, 2020; Hsing, 2008). A large proportion of people living in the countryside relies on rice production and its agricultural products as a primary income source to sustain their livelihoods (Andoko, 2020; Yang, 2016). Due to impacts of climate change through global warming, Taiwan has recently experienced more extreme weather events, including droughts, floods, and tropical storms, leading to tremendous damages to rice cultivated areas, and a significant reduction in crop production (Lee, 2019; Shiau and Hsiao, 2012; Yeh, 2021). For instance, severe and prolonged droughts occurred in the island throughout the entire rice-growing seasons in 2020 due to precipitation deficits had resulted in a significant decrease of at least 25% in rice production, compared to that of a five-year average (USDA, 2020). Therefore, there has been a growing concern among scientists and agronomists in the country that changing temperature and rainfall patterns could be a driver of the increasing intensity and frequency of drought events, leading to negative effects on rice harvested areas and crop yields.
Because of such unprecedented changes in climate, government initiatives have been made to seasonally monitor changes in cropping practices and estimate rice yields and production to timely assist policymakers in formulating successful strategies to tackle national food security issues. Efforts have been made to estimate rice crop yields through costly and time-consuming field measurements. However, the estimation results are usually inaccurate and unreliable until the rice crop is harvested due to the limited samples collected from the field measurements used in regional interpolation. To reduce labor costs, crop simulation models, coupled to scenario data, have also been introduced to estimate and forecast rice crop yields for limited area or experimental sites (Jha et al., 2019; Togliatti et al., 2017). The advantages of these models are that they can accurately predict future crop yields and offer opportunities to assess crop yield resilience to effects of climate change, given the model parameters are calibrated. However, they also reveal some disadvantages, attributed to complicated and expensive inputs of biophysical factors (e.g., rice genotype coefficients, weather factors, soil types, and information on crop management activities), which are usually unavailable in many regions around the world.
To overcome most of the limitations in capturing yield variability over a large region, remote sensing methods have been applied for yield crop estimation and prediction because they are deemed to be more cost-effective field measurements and crop simulation models due to the advantages of satellite imageries, such as wide coverage and high spatial and temporal resolutions (Arab et al., 2021; Islam et al., 2021; Khaki et al., 2021; Khalil and Abdullaev, 2021; Leroux et al., 2019; Ma et al., 2021; Vallentin et al., 2021). For example, the recent launch of Sentinel-2 A/B twin satellites in 2015 and 2017 allows us to exploit crop phenology at the field level due to high spatial and temporal resolutions (i.e., 10 m spatial resolution and revisit cycle of 5 days), which is important for crop monitoring and yield modeling in Taiwan, where rice parcels are relatively small and fragmented. However, the use of optical satellite data often faces challenges due to cloud cover commonly observed in the region, particularly during the rainy season. Alternatively, temporal pixel-based image composite methods, such as maximum and median value composite methods (Flood, 2013; Guerschman et al., 2009; Mountford et al., 2017), can be applied to mitigate effects, including cloud contamination, atmospheric attenuation, and surface directional reflectance (Holben, 1986; Huete et al., 2002; Roy et al., 2010). In this work, taking advantage of the high temporal resolution of Sentinel-2 data, we created the monthly cloud-free image composites for rice yield modeling and prediction using the median value composite method. This method has the advantage of reducing cloudy and shadow areas that have relatively high and low reflectance values, respectively.
The enhanced vegetation index (EVI), which is designed to overcome the saturation issue of the normalized difference vegetation index (NDVI), was applied in this research because it has been proven to be strongly correlated with crop biomass (Hatfield, 1983; Huete et al., 1997; Huete et al., 2002). Due to the nonlinear relationship between EVI and crop yields, three commonly-used ML regression models, including random forest (RF) (Breiman, 2001), support vector machine (SVM) (Cortes and Vapnik, 1995; Vapnik, 1999), and artificial neural networks (ANN) (McCulloch and Pitts, 1990; Rumelhart et al., 1986), were used in this work for the sake of accuracy comparisons among these methods for yield modeling and predictions in the region. The advantages of ML models are that they can perform complex nonlinear regression tasks with a large amount of multi-dimensional and multi-variety datasets. In addition, they can give more generalized solutions and are also less prone to overfitting problems, given the training samples to be pure or near-pure, and the number of training samples is adequate for model training. The algorithms are capable of learning training samples by themselves to identify optimal trends and patterns of the datasets. Once the models are trained, they can precisely predict future instances (Boser et al., 1992; Breiman, 2001; Karsten et al., 2018; Lary et al., 2016; Liakos et al., 2018).
The main objective of this research was to evaluate the applicability of the monthly composites from Sentinel-2 imageries for rice yield predictions at the field-scale level using ML techniques in Taiwan. We also examined the hypothesis that a significant relationship between rice crop yields and the time-series EVI data surrounding the heading or boosting to ripening stage. Thus, rice yield predictions can be made using ML models before the harvesting period in the study region.
Section snippets
Study region
The study region is situated in western Taiwan, covering approximately 671,772 ha (Fig. 1). We selected this region for yield investigation because it comprises four main rice-producing counties (i.e., Changhua, Yunlin, Chiayi, and Tainan), annually contributing at least 30% of the country's total rice production. The landform of the region is characterized by alluvial plains in the west of country's central mountain range with an average elevation lower than 20 m above sea level, and mostly
Satellite data
The Sentinel-2 A/B top-of-atmosphere reflectence product (level-1C), acquired from the European Space Agency (ESA) from (Liu et al., 2019)147 images), was used in this study for rice yield modeling. The satellite data include 13 spectral bands, with the wavelengths ranging from the visible to shortwave infrared regions, and cloud masks indicating the presence of cirrus areas. The temporal resolution of the satellite data is 5 days, with the spatial resolution of 10 m (bands 2–4, and 8), 20 m
Satellite data pre-processing
The data of Sentinel-2 A/B images, in form of top-of-atmosphere reflectance, were stored as digital numbers (DNs). The atmospheric correction and image resampling were performed to convert DNs to the surface reflectance (scale from 0 to 1) using the Sen2cor (Main-Knorn et al., 2017), embedded in the ESA's Sentinel Application Platform (SNAP) tool version 8.0. In addition, because the 5-day Sentinel-2 data were often contaminated by clouds, frequently found in tropical and subtropical regions,
Temporal characteristics of monthly EVI profile
The median value composite method was applied to generate cloud-free monthly Sentinel-2 EVI data. An example of monthly EVI averages for rice cropping areas that were extracted from the time-series EVI composites shows that the profile could preserve the magnitude of the temporal EVI data while characterizing temporal changes of phenological stages of rice crops throughout the year (Fig. 4). Specifically, for example, the rice cropping patterns in 2019 increased in EVI intensity after the rice
Conclusions
This research performed a comparative analysis of three ML models for predictions of rice crop yields at the field-scale level from monthly composites from Sentinel-2 imageries. The comparison results, between predicted yield and the official statistics, confirmed that rice crop yields were predictable one month before the harvest using ML models. The SVM model slightly outperformed RF and ANN. The testing results achieved by comparing the rice crop yields from field measurements with those
Declaration of Competing Interest
None.
Acknowledgement
This research is financed by Taiwan Agricultural Research Institute (1103011), and Taiwan Ministry of Science and Technology (109-2927-I-008-501). The financial support is fully acknowledged.
References (44)
- et al.
Prediction of grape yields from time-series vegetation indices using satellite remote sensing and a machine-learning approach
Remote Sens. Appl. Soc. Environ.
(2021) - et al.
Estimating fractional cover of photosynthetic vegetation, non-photosynthetic vegetation and bare soil in the Australian tropical savanna region upscaling the EO-1 Hyperion and MODIS sensors
Remote Sens. Environ.
(2009) Remote sensing estimators of potential and actual crop yield
Remote Sens. Environ.
(1983)- et al.
A comparison of vegetation indices over a global set of TM images for EOS-MODIS
Remote Sens. Environ.
(1997) - et al.
Overview of the radiometric and biophysical performance of the MODIS vegetation indices
Remote Sens. Environ.
(2002) - et al.
Development of remote sensing-based yield prediction models at the maturity stage of boro rice using parametric and nonparametric approaches
Remote Sens. Appl. Soc. Environ.
(2021) - et al.
Using daily data from seasonal forecasts in dynamic crop models for yield prediction: a case study for rice in Nepal’s Terai
Agric. For. Meteorol.
(2019) - et al.
Neural network for grain yield predicting based multispectral satellite imagery: comparative study
Proc. Comput. Sci.
(2021) - et al.
Machine learning in geosciences and remote sensing
Geosci. Front.
(2016) - et al.
Maize yield estimation in West Africa from crop process-induced combinations of multi-domain remote sensing indices
Eur. J. Agron.
(2019)