Field-scale rice yield prediction from Sentinel-2 monthly image composites using machine learning algorithms

https://doi.org/10.1016/j.ecoinf.2022.101618Get rights and content

Highlights

  • This study performed rice crop yield predictions using Sentinel-2 image composites.

  • Data were processed for 2019–2020 crop seasons using machine learning algorithms.

  • SVM slightly outperformed RF and ANN for rice yield predictions at the field level.

  • Methods can be used for rice yield prediction one month before harvesting in Taiwan.

Abstract

Machine learning (ML) along with high volume of satellite images offers an alternative to agronomists in crop yield predictions for decision support systems. This research exploited the possibility of using monthly image composites from Sentinel-2 imageries for rice crop yield predictions one month before the harvesting period at the field level using ML techniques in Taiwan. Three ML models, including random forest (RF), support vector machine (SVM), and artificial neural networks (ANN), were designed to address the research question of yield predictions in four consecutive growing seasons from 2019 to 2020 using field survey data. The research findings of yield modeling and predictions showed that SVM slightly outperformed RF and ANN. The results of model validation, obtained from SVM models using the data from transplanting to ripening, showed that the root mean square percentage error (RMSPE) and the mean absolute percentage error (MAPE) values were 5.5% and 4.5% for the 2019 second crop, and 4.7% and 3.5% for the 2020 first crop, respectively. The results of yield predictions (obtained from SVM) for the 2019 second crop and the 2020 first crop evaluated against the government statistics indicated a close agreement between these two datasets, with the RMSPE and MAPE values generally smaller than 11.2% and 9.2%. The SVM model configuration parameters used for rice crop yield predictions indicated satisfactory results. The comparison results between the predicted yields and the official statistics showed slight underestimations, with RMSPE and MAPE values of 9.4% and 7.1% for the 2019 first crop (hindcast), and 11.0% and 9.4% for the 2020 second crop (forecast), respectively. This study has successfully proven the validity of our methods for yield modeling and prediction from monthly composites from Sentinel-2 imageries using ML algorithms. The research findings from this research work could useful for agronomists to timely formulate action plans to address national food security issues.

Introduction

Rice agriculture plays an important role in Taiwan’'s rural economy and culture. This crop directly feeds more than 23.5 million people (Andoko, 2020; Hsing, 2008). A large proportion of people living in the countryside relies on rice production and its agricultural products as a primary income source to sustain their livelihoods (Andoko, 2020; Yang, 2016). Due to impacts of climate change through global warming, Taiwan has recently experienced more extreme weather events, including droughts, floods, and tropical storms, leading to tremendous damages to rice cultivated areas, and a significant reduction in crop production (Lee, 2019; Shiau and Hsiao, 2012; Yeh, 2021). For instance, severe and prolonged droughts occurred in the island throughout the entire rice-growing seasons in 2020 due to precipitation deficits had resulted in a significant decrease of at least 25% in rice production, compared to that of a five-year average (USDA, 2020). Therefore, there has been a growing concern among scientists and agronomists in the country that changing temperature and rainfall patterns could be a driver of the increasing intensity and frequency of drought events, leading to negative effects on rice harvested areas and crop yields.

Because of such unprecedented changes in climate, government initiatives have been made to seasonally monitor changes in cropping practices and estimate rice yields and production to timely assist policymakers in formulating successful strategies to tackle national food security issues. Efforts have been made to estimate rice crop yields through costly and time-consuming field measurements. However, the estimation results are usually inaccurate and unreliable until the rice crop is harvested due to the limited samples collected from the field measurements used in regional interpolation. To reduce labor costs, crop simulation models, coupled to scenario data, have also been introduced to estimate and forecast rice crop yields for limited area or experimental sites (Jha et al., 2019; Togliatti et al., 2017). The advantages of these models are that they can accurately predict future crop yields and offer opportunities to assess crop yield resilience to effects of climate change, given the model parameters are calibrated. However, they also reveal some disadvantages, attributed to complicated and expensive inputs of biophysical factors (e.g., rice genotype coefficients, weather factors, soil types, and information on crop management activities), which are usually unavailable in many regions around the world.

To overcome most of the limitations in capturing yield variability over a large region, remote sensing methods have been applied for yield crop estimation and prediction because they are deemed to be more cost-effective field measurements and crop simulation models due to the advantages of satellite imageries, such as wide coverage and high spatial and temporal resolutions (Arab et al., 2021; Islam et al., 2021; Khaki et al., 2021; Khalil and Abdullaev, 2021; Leroux et al., 2019; Ma et al., 2021; Vallentin et al., 2021). For example, the recent launch of Sentinel-2 A/B twin satellites in 2015 and 2017 allows us to exploit crop phenology at the field level due to high spatial and temporal resolutions (i.e., 10 m spatial resolution and revisit cycle of 5 days), which is important for crop monitoring and yield modeling in Taiwan, where rice parcels are relatively small and fragmented. However, the use of optical satellite data often faces challenges due to cloud cover commonly observed in the region, particularly during the rainy season. Alternatively, temporal pixel-based image composite methods, such as maximum and median value composite methods (Flood, 2013; Guerschman et al., 2009; Mountford et al., 2017), can be applied to mitigate effects, including cloud contamination, atmospheric attenuation, and surface directional reflectance (Holben, 1986; Huete et al., 2002; Roy et al., 2010). In this work, taking advantage of the high temporal resolution of Sentinel-2 data, we created the monthly cloud-free image composites for rice yield modeling and prediction using the median value composite method. This method has the advantage of reducing cloudy and shadow areas that have relatively high and low reflectance values, respectively.

The enhanced vegetation index (EVI), which is designed to overcome the saturation issue of the normalized difference vegetation index (NDVI), was applied in this research because it has been proven to be strongly correlated with crop biomass (Hatfield, 1983; Huete et al., 1997; Huete et al., 2002). Due to the nonlinear relationship between EVI and crop yields, three commonly-used ML regression models, including random forest (RF) (Breiman, 2001), support vector machine (SVM) (Cortes and Vapnik, 1995; Vapnik, 1999), and artificial neural networks (ANN) (McCulloch and Pitts, 1990; Rumelhart et al., 1986), were used in this work for the sake of accuracy comparisons among these methods for yield modeling and predictions in the region. The advantages of ML models are that they can perform complex nonlinear regression tasks with a large amount of multi-dimensional and multi-variety datasets. In addition, they can give more generalized solutions and are also less prone to overfitting problems, given the training samples to be pure or near-pure, and the number of training samples is adequate for model training. The algorithms are capable of learning training samples by themselves to identify optimal trends and patterns of the datasets. Once the models are trained, they can precisely predict future instances (Boser et al., 1992; Breiman, 2001; Karsten et al., 2018; Lary et al., 2016; Liakos et al., 2018).

The main objective of this research was to evaluate the applicability of the monthly composites from Sentinel-2 imageries for rice yield predictions at the field-scale level using ML techniques in Taiwan. We also examined the hypothesis that a significant relationship between rice crop yields and the time-series EVI data surrounding the heading or boosting to ripening stage. Thus, rice yield predictions can be made using ML models before the harvesting period in the study region.

Section snippets

Study region

The study region is situated in western Taiwan, covering approximately 671,772 ha (Fig. 1). We selected this region for yield investigation because it comprises four main rice-producing counties (i.e., Changhua, Yunlin, Chiayi, and Tainan), annually contributing at least 30% of the country's total rice production. The landform of the region is characterized by alluvial plains in the west of country's central mountain range with an average elevation lower than 20 m above sea level, and mostly

Satellite data

The Sentinel-2 A/B top-of-atmosphere reflectence product (level-1C), acquired from the European Space Agency (ESA) from (Liu et al., 2019)147 images), was used in this study for rice yield modeling. The satellite data include 13 spectral bands, with the wavelengths ranging from the visible to shortwave infrared regions, and cloud masks indicating the presence of cirrus areas. The temporal resolution of the satellite data is 5 days, with the spatial resolution of 10 m (bands 2–4, and 8), 20 m

Satellite data pre-processing

The data of Sentinel-2 A/B images, in form of top-of-atmosphere reflectance, were stored as digital numbers (DNs). The atmospheric correction and image resampling were performed to convert DNs to the surface reflectance (scale from 0 to 1) using the Sen2cor (Main-Knorn et al., 2017), embedded in the ESA's Sentinel Application Platform (SNAP) tool version 8.0. In addition, because the 5-day Sentinel-2 data were often contaminated by clouds, frequently found in tropical and subtropical regions,

Temporal characteristics of monthly EVI profile

The median value composite method was applied to generate cloud-free monthly Sentinel-2 EVI data. An example of monthly EVI averages for rice cropping areas that were extracted from the time-series EVI composites shows that the profile could preserve the magnitude of the temporal EVI data while characterizing temporal changes of phenological stages of rice crops throughout the year (Fig. 4). Specifically, for example, the rice cropping patterns in 2019 increased in EVI intensity after the rice

Conclusions

This research performed a comparative analysis of three ML models for predictions of rice crop yields at the field-scale level from monthly composites from Sentinel-2 imageries. The comparison results, between predicted yield and the official statistics, confirmed that rice crop yields were predictable one month before the harvest using ML models. The SVM model slightly outperformed RF and ANN. The testing results achieved by comparing the rice crop yields from field measurements with those

Declaration of Competing Interest

None.

Acknowledgement

This research is financed by Taiwan Agricultural Research Institute (1103011), and Taiwan Ministry of Science and Technology (109-2927-I-008-501). The financial support is fully acknowledged.

References (44)

  • C.-A. Liu et al.

    Research advances of SAR remote sensing for agriculture applications: a review

    J. Integr. Agric.

    (2019)
  • Y. Ma et al.

    Corn yield prediction and uncertainty analysis based on remotely sensed variables using a Bayesian neural network approach

    Remote Sens. Environ.

    (2021)
  • W. McCulloch et al.

    A logical calculus of the ideas immanent in nervous activity

    Bull. Math. Biol.

    (1990)
  • G.L. Mountford et al.

    Chapter 4 - sensitivity of vegetation phenological parameters: From satellite sensors to spatial resolution and temporal compositing period

  • D.P. Roy et al.

    Web-enabled Landsat data (WELD): Landsat ETM+ composited mosaics of the conterminous United States

    Remote Sens. Environ.

    (2010)
  • K. Togliatti et al.

    How does inclusion of weather forecasting impact in-season crop model predictions?

    Field Crop Res.

    (2017)
  • P. Toscano et al.

    Durum wheat modeling: the Delphi system, 11 years of observations in Italy

    Eur. J. Agron.

    (2012)
  • M.D. Wilson

    Support vector machines

  • E. Andoko

    Review of Taiwan’s Food Security Strategy

    FFTC Agricultural Policy Platform

    (2020)
  • C.M. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • B.E. Boser et al.

    A training algorithm for optimal margin classiers

  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • Cited by (0)

    View full text