What is the best RNN-cell structure to forecast each time series behavior?

doi:10.1016/j.eswa.2022.119140

Expert Systems with Applications

Volume 215, 1 April 2023, 119140

https://doi.org/10.1016/j.eswa.2022.119140 Get rights and content

Highlights

•
A taxonomy of time series behaviors was proposed.
•
A taxonomy of Recurrent Neural Network (RNN) cell structures was proposed.
•
A set of 31 RNN cell structures was evaluated.
•
The utility of each component in the Long–Short Term Memory cell was evaluated.
•
The best cell structure for forecasting each time series behavior was provided.

Abstract

It is unquestionable that time series forecasting is of paramount importance in many fields. The most used machine learning models to address time series forecasting tasks are Recurrent Neural Networks (RNNs). Typically, those models are built using one of the three most popular cells, ELMAN, Long–Short Term Memory (LSTM), or Gated Recurrent Unit (GRU) cells, each cell has a different structure and implies a different computational cost. However, it is not clear why and when to use each RNN-cell structure. Actually, there is no comprehensive characterization of all the possible time series behaviors and no guidance on what RNN cell structure is the most suitable for each behavior. The objective of this study is two-fold: it presents a comprehensive taxonomy of all possible time series behaviors (deterministic, random-walk, nonlinear, long-memory, and chaotic), and provides insights into the best RNN cell structure for each time series behavior. We conducted two experiments: (1) The first experiment evaluates and analyzes the role of each component in the LSTM-Vanilla cell by creating 11 variants based on one alteration in its basic architecture (removing, adding, or substituting one cell component). (2) The second experiment evaluates and analyzes the performance of 20 possible RNN-cell structures. To evaluate, compare, and select the best model, different statistical metrics were used: error-based metrics, information criterion-based metrics, naïve-based metric, and direction change-based metric. To further improve our confidence in the models’ interpretation and selection, Friedman Wilcoxon–Holm signed-rank test was used.

Our results advocate the usage and the exploration of the newly created RNN variant, named SLIM, in time series forecasting thanks to its high ability to accurately predict the different time series behaviors as well as its simple structural design that does not require expensive temporal and computing resources.

Introduction

Many real-world prediction problems involve a temporal dimension and typically require the estimation of numerical sequential data referred to as time series forecasting. Time series forecasting is one of the major stones in data science playing a pivotal role in almost all domains, including meteorology (Murat, Malinowska, Gos, & Krzyszczak, 2018), natural disasters control (Erdelj, Król, & Natalizio, 2017), energy (Bourdeau, Zhai, Nefzaoui, Guo, & Chatellier, 2019), manufacturing (Wang & Chen, 2018), finance (Liu, 2019), econometrics (Siami-Namini & Namin, 2018), telecommunication (Maeng, Kim, & Shin, 2020), healthcare (Khaldi, E. Afia, & Chiheb, 2019b) to name a few. Accurate time series forecasting requires robust forecasting models.

Currently, Recurrent Neural Network (RNN) models are one of the most popular machine learning models in sequential data modeling, including natural language, image/video captioning, and forecasting (Chimmula and Zhang, 2020, Sutskever et al., 2014, Vinyals et al., 2015). Such RNN models are built as a sequence of the same cell structure, for example, ELMAN cell, Long–Short Term Memory (LSTM) cell or Gated Recurrent Unit (GRU) cell. The simplest RNN cell is ELMAN, it includes one layer of hidden neurons. While, LSTM and GRU cells incorporate a gating mechanism, three gates in LSTM and two gates in GRU, where each gate is a layer of hidden neurons. Many other cell structures have been introduced in the literature (Lu and Salem, 2017, Mikolov et al., 2014, Pulver and Lyu, 2017, Zhou et al., 2016). However, to solve time series forecasting tasks, the building of RNN models is typically limited to the three aforementioned cell structures (Alkhayat and Mehmood, 2021, Liu et al., 2021, Rajagukguk et al., 2020, Runge and Zmeureanu, 2021, Sezer et al., 2020), as they provide very good accuracy (Runge and Zmeureanu, 2021, Sezer et al., 2020).

Nevertheless, building robust RNN models for time series forecasting is still a challenging task as there does not exist yet a clear understanding of times series data itself and hence there exist very little knowledge about what cell structure is the most appropriate for each data type. In general, when facing a new problem, practitioners select one of the most popular cells, usually LSTM, and use it as a building block for the RNN model without any guarantee on the appropriateness of this cell to the current data. The objective of this work is two-fold. It presents a comprehensive characterization of time series behaviors and provides guidelines on the best RNN cell structure for each behavior. As far as we know, this is the first work providing such insights. The main contributions of this study can be summarized as follows:

•
To provide a better understanding of times series data by presenting a comprehensive characterization of their behaviors.
•
To determine the most appropriate cell structure for each time series behavior (i.e., whether a specific cell structure should be avoided for certain behaviors).
•
To identify differences in predictability between behaviors (i.e., whether certain behaviors are easier or harder to predict across all cell models).
•
To provide useful guidelines that can assist decision-makers and scholars in the process of selecting the most suitable RNN-cell structure from both, a computational and performance point of view.

The remainder of this study is organized as follows: Section 2 states the related works. Section 3 presents a taxonomy of time series behaviors. Section 4 presents a taxonomy of RNN cells. Section 5 describes the experiment. Section 6 exhibits and discusses the obtained results. Finally, the last section concludes the findings and spots light on future research directions.

Section snippets

Related works

The last decades have known an explosion of time series data acquired by automated data collection devices such as monitors, IoT devices, and sensors (Bourdeau et al., 2019, Erdelj et al., 2017, Murat et al., 2018). The collected time series describes different quantitative values: stock price, amount of sales, electricity load demand, weather temperature, etc. In parallel, a large number of comparative studies have been carried out in the forecasting area (Athiyarath et al., 2020, Bianchi et

Taxonomy of times series behaviors

As far as we know, this is the first work introducing a complete formal characterization of real-world time series. Time series emerging from real-world applications can either follow a stochastic mechanism or a chaotic mechanism and are usually contaminated by white noise (Boaretto et al., 2021, Box et al., 2015, Cencini et al., 2000, Wales, 1991, Zunino et al., 2012).

Taxonomy of RNN cells

Humans do not start their thinking from zero every second, our thoughts have persistence in the memory of our brains. For example, as the reader reads this paper, he/she understands each word based on his/her understanding of the words before. The absence of memory is the major shortcoming in traditional machine learning models, particularly in feed-forward neural networks (FNNs). To overcome this limitation, RNNs integrate the concept of feedback connections in their structure (Fig. 6, where $x_{t}$

Experimental structure

Two experiments have been carried out in this study. The first experiment analyzes the utility of each LSTM-Vanilla cell component in forecasting the five time series behaviors.While, the second experiment evaluates different variants of RNN cell structures in forecasting these behaviors. In this section, we first describe the process we followed to generate the dataset for each time series behavior (Section 5.1). Then, we present the selected models for the first and the second experiment

Results and discussion

In this section, we present the results of the two conducted experiments: (1) The first experiment consists of evaluating and analyzing the role of each component in the LSTM-Vanilla cell with respect to the five time series behaviors. The evaluated architectures were generated by removing (NIG, NFG, NOG, NIAF, NFAF, NOAF, and NCAF), adding (PC and FGR), or substituting (FB1 and CIFG) one cell component. (2) The second experiment aims at evaluating and analyzing the performance of a multitude

Conclusions

In this paper, we proposed a comprehensive taxonomy of the main time series behaviors, which are: deterministic, random-walk, nonlinear, long-memory, and chaotic. Then, we conducted two experiments to show the best RNN cell structure for each behavior. In the first experiment, we evaluated the LSTM-Vanilla model and 11 of its variants created based on one alteration in its basic architecture that consists in (1) removing (NIG, NFG, NOG, NIAF, NFAF, NOAF, and NCAF), (2) adding (PC and FGR), or

CRediT authorship contribution statement

Rohaifa Khaldi: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Abdellatif El Afia: Validation, Supervision. Raddouane Chiheb: Supervision. Siham Tabik: Methodology, Validation, Resources, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by DETECTOR (A-RNM-256-UGR18 Universidad de Granada/FEDER), LifeWatch SmartEcomountains (LifeWatch-2019-10-UGR-01 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER), DeepL-ISCO (A-TIC-458-UGR18 Ministerio de Ciencia e Innovación/FEDER), and BigDDL-CET (P18-FR-4961 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER).

References (124)

AlkhayatG. et al.
A review and taxonomy of wind and solar energy forecasting methods based on deep learning
Energy and AI
(2021)
BensaïdaA. et al.
High level chaos in the exchange and index markets
Chaos, Solitons & Fractals
(2013)
BourdeauM. et al.
Modeling and forecasting building energy consumption: A review of data-driven techniques
Sustainable Cities and Society
(2019)
ChandraR. et al.
Cooperative coevolution of ELMAN recurrent neural networks for chaotic time series prediction
Neurocomputing
(2012)
ChimmulaV.K.R. et al.
Time series forecasting of COVID-19 transmission in canada using LSTM networks
Chaos, Solitons & Fractals
(2020)
DudekG.
Neural networks for pattern-based short-term load forecasting: A comparative study
Neurocomputing
(2016)
ElmanJ.L.
Finding structure in time
Cognitive Science
(1990)
ErdeljM. et al.
Wireless sensor networks and multi-UAV systems for natural disaster management
Computer Networks
(2017)
GranataF.
Evapotranspiration evaluation models based on machine learning algorithms a comparative study
Agricultural Water Management
(2019)
GrassbergerP. et al.
Dimensions and entropies of strange attractors from a fluctuating dynamics approach
Physica D: Nonlinear Phenomena
(1984)

KaplanD.T.

Exceptional events as evidence for determinism

Physica D: Nonlinear Phenomena

(1994)

KhaldiR. et al.

Forecasting of weekly patient visits to emergency department: Real case study

Procedia Computer Science

(2019)

KwiatkowskiD. et al.

Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?

Journal of Econometrics

(1992)

LimT.P. et al.

Chaotic time series prediction and additive white Gaussian noise

Physics Letters A

(2007)

LiuY.

Novel volatility forecasting using deep learning–long short term memory recurrent neural networks

Expert Systems with Applications

(2019)

LiuH. et al.

Intelligent modeling strategies for forecasting air quality time series: A review

Applied Soft Computing

(2021)

MaengK. et al.

Demand forecasting for the 5G service market considering consumer preference and purchase delay behavior

Telematics and Informatics

(2020)

Matilla-GarcíaM. et al.

A new test for chaos and determinism based on symbolic dynamics

Journal of Economic Behavior & Organization

(2010)

PapacharalampousG. et al.

Hydrological time series forecasting using simple combinations: Big data testing and investigations on one-year ahead river flow predictability

Journal of Hydrology

(2020)

ParmezanA.R.S. et al.

Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model

Information Sciences

(2019)

SagheerA. et al.

Time series forecasting of petroleum production using deep LSTM recurrent networks

Neurocomputing

(2019)

SallesR. et al.

Nonstationary time series transformation methods: An experimental review

Knowledge-Based Systems

(2019)

SangiorgioM. et al.

Forecasting of noisy chaotic systems with deep neural networks

Chaos, Solitons & Fractals

(2021)

AbdulkarimS.

Time series prediction with simple recurrent neural networks

Bayero Journal of Pure and Applied Sciences

(2016)

AkaikeH.

Fitting autoregressive models for prediction

Annals of the Institute of Statistical Mathematics

(1969)

AmemiyaT.

Selection of regressors

International Economic Review

(1980)

AthiyarathS. et al.

A comparative study and analysis of time series forecasting techniques

SN Computer Science

(2020)

BenavoliA. et al.

Should we really use post-hoc tests based on mean-ranks?

The Journal of Machine Learning Research

(2016)

BianchiF.M. et al.

An overview and comparative analysis of recurrent neural networks for short term load forecasting

(2017)

BoarettoB. et al.

Discriminating chaotic and stochastic time series using permutation entropy and artificial neural networks

Scientific Reports

(2021)

BoxG.E. et al.

BukhariA.H. et al.

Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting

IEEE Access

(2020)

CenciniM. et al.

Chaos or noise: Difficulties of a distinction

Physical Review E

(2000)

ChatfieldC.

ChoK. et al.

Learning phrase representations using RNN encoder-decoder for statistical machine translation

(2014)

ChoubinB. et al.

Precipitation forecasting using classification and regression trees (CART) model: A comparative study of different approaches

Environmental Earth Sciences

(2018)

CroneS.

Nn5 forecasting competition for artificial neural networks & computational intelligence

(2008)

DauH.A. et al.

The UCR time series archive

IEEE/CAA Journal of Automatica Sinica

(2019)

DemšarJ.

Statistical comparisons of classifiers over multiple data sets

The Journal of Machine Learning Research

(2006)

DeyR. et al.

Gate-variants of gated recurrent unit (GRU) neural networks

DickeyD.A. et al.

Distribution of the estimators for autoregressive time series with a unit root

Journal of the American Statistical Association

(1979)

DivinaF. et al.

A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings

Energies

(2019)

EckmannJ.-P. et al.

Ergodic theory of chaos and strange attractors

The Theory of Chaotic Attractors

(1985)

FindleyD.F.

Counterexamples to parsimony and BIC

Annals of the Institute of Statistical Mathematics

(1991)

FischerT. et al.

FriedmanM.

A comparison of alternative tests of significance for the problem of M rankings

The Annals of Mathematical Statistics

(1940)

GarciaS. et al.

An extension on” statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

Journal of Machine Learning Research

(2008)

GersF.A. et al.

Recurrent nets that time and count

GersF.A. et al.

LSTMrecurrent networks learn simple context-free and context-sensitive languages

IEEE Transactions on Neural Networks

(2001)

GersF.A. et al.

Learning to forget: Continual prediction with LSTM

Neural Computation

(2000)

Cited by (11)

Insight into glacio-hydrologicalprocesses using explainable machine-learning (XAI) models
2024, Journal of Hydrology
The glacio-hydrological process is essential in the global water cycle but is complex and poorly understood. In this study, we couple the deep Shapley additive explanation (SHAP) with a long short-term memory (LSTM) model to construct a machine-learning (XAI) framework that describes the glacio-hydrological process in Urumqi Glacier No. 1, China. The XAI framework reveals 1) the dominant hydro-meteorological factors have a five-month lead time, and each factor has its own active time and degree of contribution; 2) the temperature and precipitation within the lead time dominate the process; 3) identifiable combination of the factors, instead of extreme events themselves, creates the extreme glacio-hydrological phenomena.
Generally, the glacial meltwater replenishes the glacial stream runoff, which is influenced by many environmental factors. In particular, the runoff responds to the change in the glacier mass balance with hysteresis within five months. Overall, the temperature and precipitation within the lead time (4–5 months) dominate the runoff processes. This study quantifies the Contribution of each input in the glacio-hydrological process and provides valuable insight into the interaction of various hydro-meteorological factors.
GATE: A guided approach for time series ensemble forecasting
2024, Expert Systems with Applications
In this article, a new ensemble learning model called GATE is proposed to improve the accuracy and stability of time-series forecasting, which is a crucial aspect of modern engineering practices. Despite the promise of deep learning (DL) models in this area, their performance can be volatile due to the diversity of time series data. To address this, the GATE model combines the strengths of recurrent neural networks (RNN), long short-term memory network (LSTM), and convolution-LSTM (Conv-LSTM) structures and utilizes an unsupervised learning strategy to steer the ensemble output using a guided network. To prevent overfitting in DL models, GATE optimizes the sample loss function and the weight updating function for each individual model within the ensemble structure. A comprehensive evaluation of the proposed GATE method on four real-world datasets is presented. The experimental results unequivocally demonstrate that GATE surpasses state-of-the-art ensemble methods and individual models, exhibiting the best performance in terms of testing errors. Notably, GATE outperforms existing single models in addressing long-term prediction tasks. To validate the effectiveness of GATE, ablation studies is carried out, comparing different ensemble combinations involving two distinct models. Through systematic analysis, these studies provided valuable insights into the performance of various ensemble configurations, further confirming the effectiveness and superiority of the proposed GATE method.
Human-cognition-inspired deep model with its application to ocean wave height forecasting
2023, Expert Systems with Applications
Ocean wave height (OWH) forecasting is indispensable but challenging task since that the series evolution involves mixed effects of numerous factors. However, most deep models only focus on nonlinear fitting in the data layer, are hard to accurately learn its evolution. By the fact that experienced fishermen achieve cognition for complex marine phenomena, this paper develops a human-cognition-inspired deep model for forecasting OWH including the diverse sense, brain analysis, and anticipation module. Firstly, through imitating the function of extracting diverse features based on multi-senses, the first module converts the original series into multiple simple modes via the multivariate variational mode decomposition (MVMD). Secondly, through imitating the gate and collaboration functions in the brain, the second module performs the capture of internal relevance and long short-term dependencies from each mode. Thirdly, through imitating the function of achieving reactions to complex environments, the third module sums forecasts of each mode and reconstructs final forecasts. Deep simulations of the handling flowchart and functions ensure effective forecasts. Five experiments and six discussions under two real-world OWH show that the proposed model is superior to 12 baselines, improves the mean absolute percent error of 64.6% and 63.9% on average, and provides reliable evidences for ocean wave management.
Development and Application of an Innovative Dissolved Oxygen Prediction Fusion Model
2024, SSRN
Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced language
2024, PeerJ Computer Science
A Self-Organization Reconstruction Method of Esn Reservoir Structure Based on Reinforcement Learning
2023, SSRN

View all citing articles on Scopus

View full text

What is the best RNN-cell structure to forecast each time series behavior?

Highlights

Abstract

Introduction

Section snippets

Related works

Taxonomy of times series behaviors

Taxonomy of RNN cells

Experimental structure

Results and discussion

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Energy and AI

Chaos, Solitons & Fractals

Sustainable Cities and Society

Neurocomputing

Chaos, Solitons & Fractals

Neurocomputing

Cognitive Science

Computer Networks

Agricultural Water Management

Physica D: Nonlinear Phenomena

Physica D: Nonlinear Phenomena

Procedia Computer Science

Journal of Econometrics

Physics Letters A

Expert Systems with Applications

Applied Soft Computing

Telematics and Informatics

Journal of Economic Behavior & Organization

Journal of Hydrology

Information Sciences

Neurocomputing

Knowledge-Based Systems

Chaos, Solitons & Fractals

Time series prediction with simple recurrent neural networks

Bayero Journal of Pure and Applied Sciences

Fitting autoregressive models for prediction

Annals of the Institute of Statistical Mathematics

Selection of regressors

International Economic Review

A comparative study and analysis of time series forecasting techniques

SN Computer Science

Should we really use post-hoc tests based on mean-ranks?

The Journal of Machine Learning Research

An overview and comparative analysis of recurrent neural networks for short term load forecasting

Discriminating chaotic and stochastic time series using permutation entropy and artificial neural networks

Scientific Reports

Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting

IEEE Access

Chaos or noise: Difficulties of a distinction

Physical Review E

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Precipitation forecasting using classification and regression trees (CART) model: A comparative study of different approaches

Environmental Earth Sciences

Nn5 forecasting competition for artificial neural networks & computational intelligence

The UCR time series archive

IEEE/CAA Journal of Automatica Sinica

Statistical comparisons of classifiers over multiple data sets

The Journal of Machine Learning Research

Gate-variants of gated recurrent unit (GRU) neural networks

Distribution of the estimators for autoregressive time series with a unit root

Journal of the American Statistical Association

A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings

Energies

Ergodic theory of chaos and strange attractors

The Theory of Chaotic Attractors

Counterexamples to parsimony and BIC

Annals of the Institute of Statistical Mathematics

A comparison of alternative tests of significance for the problem of M rankings

The Annals of Mathematical Statistics

An extension on” statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

Journal of Machine Learning Research

Recurrent nets that time and count

LSTMrecurrent networks learn simple context-free and context-sensitive languages

IEEE Transactions on Neural Networks

Learning to forget: Continual prediction with LSTM

Neural Computation