Elsevier

Ecological Informatics

Volume 36, November 2016, Pages 94-105
Ecological Informatics

Evaluating temporal aggregation for predicting the sea surface temperature of the Atlantic Ocean

https://doi.org/10.1016/j.ecoinf.2016.10.004Get rights and content

Highlights

  • We evaluated temporal aggregation for SST prediction (daily, weekly and monthly).

  • We have originally explored under different prediction horizons and different training set sizes.

  • We have used ARIMA model to evaluate and analyze the benefits of temporal aggregation.

  • We measured the quality of predictions with and without temporal aggregation by comparing them with Random Walk.

  • We highlighted the influence of prediction horizons and training sets size in applying temporal aggregation for SST.

Abstract

Extreme environmental events such as droughts affect millions of people all around the world. Although it is not possible to prevent this type of event, its prediction under different time horizons enables the mitigation of eventual damages caused by its occurrence. An important variable for identifying occurrences of droughts is the sea surface temperature (SST). In the tropical Atlantic Ocean, SST data are collected and provided by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) project, which is an observation network composed of sensor buoys arranged in this region. Sensors of this type, and more generally Internet of Things (IoT) sensors, commonly lead to data losses that influence the quality of datasets collected for adjusting prediction models. In this paper, we explore the influence of temporal aggregation in predicting step-ahead SST considering different prediction horizons and different sizes for training datasets. We have conducted several experiments using data collected by PIRATA project. Our results point out scenarios for training datasets and prediction horizons indicating whether or not temporal aggregated SST time series may be beneficial for prediction.

Introduction

There are evidences in the literature pointing out to the existence of a causal relationship between the tropical southwestern Atlantic sea surface temperatures (SST) and the occurrence of extreme weather events (Cho et al., 2010, Fu et al., 2001, Lins et al., 2013, Moura and Shukla, 1981, Ronchail et al., 2002, Sukov et al., 2008). In particular, the variability of tropical Atlantic Ocean SST has a strong influence on the distribution of precipitations in Southern Tropical America (Nobre and Shukla, 1996), including the northeastern of Brazil (Hastenrath, 1984) and southern-western of Amazon region (Ronchail et al., 2002, Yoon and Zeng, 2009).

Among such extreme events, droughts are very relevant because of their economic impact. The causes for severe droughts in northeastern Brazil have been studied for a long time and currently there is an understating that main factors contributing to the occurrence of this natural disaster are related to SST, including both the El Niño Southern Oscillation (ENSO) and the Intertropical Convergence Zone (ITCZ) (Durand et al., 2005, Hastenrath, 2011, Aug, Liu and Jurez, 2001, Moura and Shukla, 1981).

In Brazil, records of droughts and its socioeconomic impacts are dated since the beginning of Portuguese colonization, but it was in the 17th century that governments began to take initiatives to mitigate their effects (Hastenrath, 2011). It is estimated that 32.8 million people have been affected by drought in the last thirty years, which lead to a loss of approximately $2.4 billion for the country (EM-DAT, 2016).

Besides, other extreme events such as Hurricane Catarina (Sukov et al., 2008), in southern Brazil, in 2004 draw the attention of researchers on the role played by SST variation in the Tropical South Atlantic. Catarina, which affected 150 thousand people, was the first hurricane ever registered in the south Atlantic and occurred in conditions very different from those usually observed.

There are also strong indications in literature that the SST has an extremely important role in other phenomena that occur in the tropical Atlantic Ocean, among them are: (i) the process that gives rise to tropical cyclones in the Atlantic Ocean (Sukov et al., 2008); (ii) the rainfall in the Amazon region (Fu et al., 2001); (iii) the Amazon vegetative volume (Cho et al., 2010); (iv) the carbon sequestration in the ocean (Gruber et al., 2002).

From the aforementioned characteristics, it is clear that monitoring south Atlantic SST time series and having a more accurate model of its evolution under different prediction horizons could allow Brazilian government and society to better prepare themselves to droughts or floods in northeastern Brazil and the Amazon basin (Ward and Folland, 1991). Although there are many works and models that focus on short-term SST prediction (Aguilar-Martinez and Hsieh, 2009, Hertig and Jacobeit, 2010, Lins et al., 2013, Wu et al., 2006;Lins et al., 2013, Wu et al., 2006), in this paper our goal is to analyze the usage of temporal aggregation for SST prediction under: (i) different prediction horizons and (ii) different sizes for training datasets.

Given the great diversity of extreme events and their commonly unstable nature, the analysis of (i) is often an important asset to studies that focus on the prediction of such events, since the desired prediction horizon depends on the characteristics of each natural event. The latter context (ii) is important since SST data are collected from different sources, such as sensors, which are not resilient to failures. Specially, after the advent of Internet of Things (IoT) (Perera et al., 2014), many IoT devices becomes vulnerable to communication interferences and instruments/sensors malfunction (Atzori et al., 2010, Chen et al., 2015, Chen et al., 2014, Perera et al., 2014, Tsai et al., 2014). In the event of device failure, a problem that may occur is a significant lack of observations in SST time series. Addressing this is important to develop accurate prediction models (Salles et al., 2015). Two main approaches are commonly applied: imputation and prediction using subsequences. Imputation techniques (Yozgatligil et al., 2012) aim in filling an incomplete time series using plausible values for missing observation, whereas prediction using subsequences consists in partitioning the available data to produce subsequences of uninterrupted observations to be used as training datasets for prediction models. In this way, this paper explores this latter approach to avoid imputation errors that could potentially interfere the analysis of our experimental results.

We give special attention to the pre-processing techniques applied to the SST data, among which we highlight temporal aggregation (Tiao, 1972). More specifically, this paper studies the influence of temporal aggregations in SST time series predictions in order to identify whether the prediction of an aggregated series can bring forward benefits when compared to the direct prediction using the series without any intervention, especially regarding the possible impacts that these aggregations may have in more extensive prediction horizons. Furthermore, we are interested in analyzing the context in which these benefits seem most prominent given different prediction horizons and sizes of training datasets. Although the effects of temporal aggregation in time series prediction is extensively studied (Rostami-Tabar et al., 2013, Rostami-Tabar et al., 2014, Wei, 1978, Stram and Wei, 1986, Englund et al., 1999, Abraham, 1982, Nelson and Plosser, 1982, Tiao, 1972, Wei, 2005, Silvestrini and Veredas, 2008, Kourentzes et al., 2014, Petropoulos and Kourentzes, 2014, Kourentzes and Petropoulos, 2015, Athanasopoulos et al., 2015), research has shown that the advantages and disadvantages of temporal aggregation are strongly related to the statistical properties of the basic non-aggregated time series available. Properties such as stationarity, autocorrelation and seasonality has been proven very important for the analysis of the temporal aggregation technique, however our statistical analysis demonstrated considerable disparity in such properties observed in SST time series. Thus, prior knowledge regarding the effects of temporal aggregation cannot be directly applied, and a further study is necessary.

The effects of temporal aggregations on prediction were evaluated using well-known linear models, namely the ARIMA (Box et al., 2008) and Random Walk models, used as performance baselines. We implemented both techniques in R language (R Development Core Team, 2008). Both ARIMA and Random Walk are simple well-established statistical models. Depending on the tuning of their parameters, they are able to produce a family of other linear models including the autoregressive, and moving average models. When well-tuned, ARIMA models are also able to handle non-stationarity and seasonality in time series. Furthermore, prediction models generated by ARIMA are capable of inspiring greater confidence on data scientists since linear models are interpretable, in contrast to some state-of-the-art machine learning methods that are normally black-box models. These characteristics indicate the potential of exploring these models in studies in which further behavioral analysis of the time series is a factor of interest.

Data corresponding to daily SST time series obtained from the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) project (GOOS-Brasil, 2015) was used in our experiments as input to the prediction models and to the analysis of the performed algorithms. Our experiments show that the benefits of applying temporal aggregation to SST prediction were influenced by the prediction horizon and the size of the training dataset.

Besides this introduction, this paper is organized in five more sections. Section 2 describes related work. Section 3 presents some background related to SST prediction, including temporal ggregation. Section 4 presents the methodology we applied for SST prediction. Section 5 presents our experimental evaluation. Section 6 concludes.

Section snippets

Related work

Many works refer to the tropical Atlantic Ocean and the issue of predicting SST has been extensively studied in the literature due to the great influence of this environmental variable in the climatic conditions of many regions throughout the globe. Most previous work concerns short-term SST predictions based on meteorological variables and general climate models, such as the HYbrid Coordinate Ocean Model (HYCOM) and NCEP Climate Forecast System (CFSv2). Some works that fall into this category

General concepts

A time series xt is a sequence ⟨x1,x2,x3,⋯ ,xn⟩ of observations from a phenomenon of interest collected over time, such that x1 corresponds to the value of the first (oldest) observation and xn is the last (more recent) one. The length n of a time series xt is represented as |xt| = n.

Most empirical work with time series assumes an underlying stationary process (Gujarati, 2002). A stationary time series x is a stochastic process such that: (i) its mean function xt¯ is constant and does not depend

Methodology

The methodology we present here enables the evaluation of temporal aggregation for predicting the SST of the Atlantic Ocean. We assume the availability of various instances of training and testing sets of time series observations at higher frequency (daily time series). We aim at predicting XT0+j (j = 1,⋯ ,N) for a corresponding aggregated time series (weekly and monthly) to allow for one-year-ahead forecast. Thus, we intend to obtain, respectively, 52 and 12 future weekly and monthly aggregates,

Dataset

We have performed several experiments using daily time series collected by the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) project. This project corresponds to an observation network composed by twenty-one buoys spread across the tropical Atlantic Ocean and devised to monitor a series of variables of the ocean-atmosphere interaction processes (Bourls et al., 2008, Servain et al., 1998). The buoys adopted by the program are known as Autonomous Temperature Line

Conclusions

Predicting SST of the Atlantic Ocean is important for governmental agencies and society to get ready for future occurrences of extreme events, such as droughts. Improving the prediction of SST in different horizons becomes a key issue. This paper evaluated the use of temporal aggregation and its consequences in different prediction horizons for SST. In addition to the daily SST data coming from the PIRATA project, we have modeled and evaluated weekly and monthly derived time series to assess

Acknowledgments

The authors thank to CNPq, CAPES, and FAPERJ for partially funding this research.

References (64)

  • S. Aguilar-Martinez et al.

    Forecasts of tropical Pacific sea surface temperatures by neural networks and support vector regression

    Int. J. Oceanol. Limnol.

    (2009)
  • G. Athanasopoulos et al.

    Forecasting with temporal hierarchies

  • B. Bourls et al.

    The Pirata program: history, accomplishments, and future directions

    Bull. Am. Meteorol. Soc.

    (2008)
  • G.E.P. Box et al.

    Time Series Analysis: Forecasting and Control

    (2008)
  • S. Chattopadhyay

    Feed forward artificial neural network model to predict the average summer-monsoon rainfall in India

    Acta Geophys.

    (2007)
  • F. Chen et al.

    Data mining for the internet of things: literature review and challenges

    Int. J. Distrib. Sens. Netw.

    (2015)
  • M. Chen et al.

    Big data: a survey

    Mob. Netw. Appl.

    (2014)
  • Z. Chen et al.

    Assessing forecast accuracy measures

    Prepr. Ser.

    (2004)
  • H. Cheng et al.

    Multistep-ahead time series prediction

  • D.C. Collins et al.

    Predictability of Indian Ocean sea surface temperature using canonical correlation analysis

    Clim. Dyn.

    (2004)
  • B. Durand et al.

    Tropical Atlantic moisture flux, convection over northeastern Brazil, and pertinence of the PIRATA network

    J. Clim.

    (2005)
  • The International Disasters Database

    (2016)
  • P. Englund et al.

    The choice of methodology for computing housing price indexes: comparisons of temporal aggregation and sample definition

    J. Real Estate Financ. Econ.

    (1999)
  • S. Feng et al.

    Influence of Atlantic sea surface temperatures on persistent drought in North America

    Clim. Dyn.

    (2010)
  • R. Fu et al.

    How do tropical sea surface temperatures influence the seasonal distribution of precipitation in the equatorial Amazon?

    J. Clim.

    (2001)
  • PIRATA dataset

  • N. Gruber et al.

    Interannual variability in the North Atlantic Ocean carbon sink

    Sci.

    (2002)
  • D. Gujarati

    Basic Econometrics

    (2002)
  • S. Hastenrath

    Interannual variability and annual cycle: mechanisms of circulation and climate in the tropical Atlantic sector

    Mon. Weather Rev.

    (1984)
  • S. Hastenrath

    Exploring the climate problems of Brazil’s Nordeste: a review

    Clim. Chang.

    (2011, Aug)
  • E. Hertig et al.

    Predictability of Mediterranean climate variables from oceanic variability. Part II: statistical models for monthly precipitation and temperature in the Mediterranean area

    Clim. Dyn.

    (2010)
  • Z.-Z. Hu et al.

    Prediction skill of monthly SST in the North Atlantic Ocean in NCEP Climate Forecast System version 2

    Clim. Dyn.

    (2012)
  • Cited by (20)

    • The effect of short-term temperature exposure on vital physiological processes of mixoplankton and protozooplankton

      2022, Marine Environmental Research
      Citation Excerpt :

      Furthermore, the temperature may also directly affect vertical and/or latitudinal distributions of organisms (Angilletta Jr and Angilletta, 2009), which could be a source of unexpected biological interactions through predation or competition and affect the species composition of a given ecosystem (Montagnes et al., 2008). In addition, a direct consequence of climate change (yet understudied) is an increased frequency and intensity of extreme short-term events (Salles et al., 2016) such as marine heatwaves, which are responsible for a sudden temperature change (Oliver et al., 2019). Experimenting on the ecosystem is unrealistic due to time and scale constraints because changes require longer periods of time to be noticeable, and sample representability is hard to determine.

    • Short and mid-term sea surface temperature prediction using time-series satellite data and LSTM-AdaBoost combination approach

      2019, Remote Sensing of Environment
      Citation Excerpt :

      The advantages of the hybrid prediction mechanism have been demonstrated in similar time series prediction tasks though different models being adopted and combined (Chen and Li, 2009; Liu et al., 2015; Messias et al., 2016; Zou and Yang, 2004). In this paper, we try to combine AdaBoost, which is a strong ensemble learning method for prediction tasks with low bias error while being not easily overfitted during training (Kun, 2015; Schapire et al., 1998), with the LSTM model to achieve better results. The combination is achieved using the averaging strategy, resulting in a model with lower variance error than the LSTM (Perrone, 1993).

    • A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data

      2019, Environmental Modelling and Software
      Citation Excerpt :

      Changes of SST can have profound effects on the global climate, marine ecosystem and even vegetation (Bouali et al., 2017; Cane et al., 1997; Castro et al., 2016; Chaidez et al., 2017; Friedel, 2012; Herbert et al., 2010; Rauscher et al., 2015; Yao et al., 2017). It can affect the precipitation distribution and further lead to droughts and floods (Rauscher et al., 2015; Salles et al., 2016). Daily SST, which shows surface thermal front and intensity, can be utilized to help detect marine ecosystems and assess the variability of such ecosystems, and to improve the understanding and qualification of the vertical structure of the water mass and the internal wave propagation.

    View all citing articles on Scopus
    View full text