What is the best RNN-cell structure to forecast each time series behavior?

https://doi.org/10.1016/j.eswa.2022.119140Get rights and content

Highlights

  • A taxonomy of time series behaviors was proposed.

  • A taxonomy of Recurrent Neural Network (RNN) cell structures was proposed.

  • A set of 31 RNN cell structures was evaluated.

  • The utility of each component in the Long–Short Term Memory cell was evaluated.

  • The best cell structure for forecasting each time series behavior was provided.

Abstract

It is unquestionable that time series forecasting is of paramount importance in many fields. The most used machine learning models to address time series forecasting tasks are Recurrent Neural Networks (RNNs). Typically, those models are built using one of the three most popular cells, ELMAN, Long–Short Term Memory (LSTM), or Gated Recurrent Unit (GRU) cells, each cell has a different structure and implies a different computational cost. However, it is not clear why and when to use each RNN-cell structure. Actually, there is no comprehensive characterization of all the possible time series behaviors and no guidance on what RNN cell structure is the most suitable for each behavior. The objective of this study is two-fold: it presents a comprehensive taxonomy of all possible time series behaviors (deterministic, random-walk, nonlinear, long-memory, and chaotic), and provides insights into the best RNN cell structure for each time series behavior. We conducted two experiments: (1) The first experiment evaluates and analyzes the role of each component in the LSTM-Vanilla cell by creating 11 variants based on one alteration in its basic architecture (removing, adding, or substituting one cell component). (2) The second experiment evaluates and analyzes the performance of 20 possible RNN-cell structures. To evaluate, compare, and select the best model, different statistical metrics were used: error-based metrics, information criterion-based metrics, naïve-based metric, and direction change-based metric. To further improve our confidence in the models’ interpretation and selection, Friedman Wilcoxon–Holm signed-rank test was used.

Our results advocate the usage and the exploration of the newly created RNN variant, named SLIM, in time series forecasting thanks to its high ability to accurately predict the different time series behaviors as well as its simple structural design that does not require expensive temporal and computing resources.

Introduction

Many real-world prediction problems involve a temporal dimension and typically require the estimation of numerical sequential data referred to as time series forecasting. Time series forecasting is one of the major stones in data science playing a pivotal role in almost all domains, including meteorology (Murat, Malinowska, Gos, & Krzyszczak, 2018), natural disasters control (Erdelj, Król, & Natalizio, 2017), energy (Bourdeau, Zhai, Nefzaoui, Guo, & Chatellier, 2019), manufacturing (Wang & Chen, 2018), finance (Liu, 2019), econometrics (Siami-Namini & Namin, 2018), telecommunication (Maeng, Kim, & Shin, 2020), healthcare (Khaldi, E. Afia, & Chiheb, 2019b) to name a few. Accurate time series forecasting requires robust forecasting models.

Currently, Recurrent Neural Network (RNN) models are one of the most popular machine learning models in sequential data modeling, including natural language, image/video captioning, and forecasting (Chimmula and Zhang, 2020, Sutskever et al., 2014, Vinyals et al., 2015). Such RNN models are built as a sequence of the same cell structure, for example, ELMAN cell, Long–Short Term Memory (LSTM) cell or Gated Recurrent Unit (GRU) cell. The simplest RNN cell is ELMAN, it includes one layer of hidden neurons. While, LSTM and GRU cells incorporate a gating mechanism, three gates in LSTM and two gates in GRU, where each gate is a layer of hidden neurons. Many other cell structures have been introduced in the literature (Lu and Salem, 2017, Mikolov et al., 2014, Pulver and Lyu, 2017, Zhou et al., 2016). However, to solve time series forecasting tasks, the building of RNN models is typically limited to the three aforementioned cell structures (Alkhayat and Mehmood, 2021, Liu et al., 2021, Rajagukguk et al., 2020, Runge and Zmeureanu, 2021, Sezer et al., 2020), as they provide very good accuracy (Runge and Zmeureanu, 2021, Sezer et al., 2020).

Nevertheless, building robust RNN models for time series forecasting is still a challenging task as there does not exist yet a clear understanding of times series data itself and hence there exist very little knowledge about what cell structure is the most appropriate for each data type. In general, when facing a new problem, practitioners select one of the most popular cells, usually LSTM, and use it as a building block for the RNN model without any guarantee on the appropriateness of this cell to the current data. The objective of this work is two-fold. It presents a comprehensive characterization of time series behaviors and provides guidelines on the best RNN cell structure for each behavior. As far as we know, this is the first work providing such insights. The main contributions of this study can be summarized as follows:

  • To provide a better understanding of times series data by presenting a comprehensive characterization of their behaviors.

  • To determine the most appropriate cell structure for each time series behavior (i.e., whether a specific cell structure should be avoided for certain behaviors).

  • To identify differences in predictability between behaviors (i.e., whether certain behaviors are easier or harder to predict across all cell models).

  • To provide useful guidelines that can assist decision-makers and scholars in the process of selecting the most suitable RNN-cell structure from both, a computational and performance point of view.

The remainder of this study is organized as follows: Section 2 states the related works. Section 3 presents a taxonomy of time series behaviors. Section 4 presents a taxonomy of RNN cells. Section 5 describes the experiment. Section 6 exhibits and discusses the obtained results. Finally, the last section concludes the findings and spots light on future research directions.

Section snippets

Related works

The last decades have known an explosion of time series data acquired by automated data collection devices such as monitors, IoT devices, and sensors (Bourdeau et al., 2019, Erdelj et al., 2017, Murat et al., 2018). The collected time series describes different quantitative values: stock price, amount of sales, electricity load demand, weather temperature, etc. In parallel, a large number of comparative studies have been carried out in the forecasting area (Athiyarath et al., 2020, Bianchi et

Taxonomy of times series behaviors

As far as we know, this is the first work introducing a complete formal characterization of real-world time series. Time series emerging from real-world applications can either follow a stochastic mechanism or a chaotic mechanism and are usually contaminated by white noise (Boaretto et al., 2021, Box et al., 2015, Cencini et al., 2000, Wales, 1991, Zunino et al., 2012).

Taxonomy of RNN cells

Humans do not start their thinking from zero every second, our thoughts have persistence in the memory of our brains. For example, as the reader reads this paper, he/she understands each word based on his/her understanding of the words before. The absence of memory is the major shortcoming in traditional machine learning models, particularly in feed-forward neural networks (FNNs). To overcome this limitation, RNNs integrate the concept of feedback connections in their structure (Fig. 6, where xt

Experimental structure

Two experiments have been carried out in this study. The first experiment analyzes the utility of each LSTM-Vanilla cell component in forecasting the five time series behaviors.While, the second experiment evaluates different variants of RNN cell structures in forecasting these behaviors. In this section, we first describe the process we followed to generate the dataset for each time series behavior (Section 5.1). Then, we present the selected models for the first and the second experiment

Results and discussion

In this section, we present the results of the two conducted experiments: (1) The first experiment consists of evaluating and analyzing the role of each component in the LSTM-Vanilla cell with respect to the five time series behaviors. The evaluated architectures were generated by removing (NIG, NFG, NOG, NIAF, NFAF, NOAF, and NCAF), adding (PC and FGR), or substituting (FB1 and CIFG) one cell component. (2) The second experiment aims at evaluating and analyzing the performance of a multitude

Conclusions

In this paper, we proposed a comprehensive taxonomy of the main time series behaviors, which are: deterministic, random-walk, nonlinear, long-memory, and chaotic. Then, we conducted two experiments to show the best RNN cell structure for each behavior. In the first experiment, we evaluated the LSTM-Vanilla model and 11 of its variants created based on one alteration in its basic architecture that consists in (1) removing (NIG, NFG, NOG, NIAF, NFAF, NOAF, and NCAF), (2) adding (PC and FGR), or

CRediT authorship contribution statement

Rohaifa Khaldi: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Abdellatif El Afia: Validation, Supervision. Raddouane Chiheb: Supervision. Siham Tabik: Methodology, Validation, Resources, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by DETECTOR (A-RNM-256-UGR18 Universidad de Granada/FEDER), LifeWatch SmartEcomountains (LifeWatch-2019-10-UGR-01 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER), DeepL-ISCO (A-TIC-458-UGR18 Ministerio de Ciencia e Innovación/FEDER), and BigDDL-CET (P18-FR-4961 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER).

References (124)

  • KaplanD.T.

    Exceptional events as evidence for determinism

    Physica D: Nonlinear Phenomena

    (1994)
  • KhaldiR. et al.

    Forecasting of weekly patient visits to emergency department: Real case study

    Procedia Computer Science

    (2019)
  • KwiatkowskiD. et al.

    Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?

    Journal of Econometrics

    (1992)
  • LimT.P. et al.

    Chaotic time series prediction and additive white Gaussian noise

    Physics Letters A

    (2007)
  • LiuY.

    Novel volatility forecasting using deep learning–long short term memory recurrent neural networks

    Expert Systems with Applications

    (2019)
  • LiuH. et al.

    Intelligent modeling strategies for forecasting air quality time series: A review

    Applied Soft Computing

    (2021)
  • MaengK. et al.

    Demand forecasting for the 5G service market considering consumer preference and purchase delay behavior

    Telematics and Informatics

    (2020)
  • Matilla-GarcíaM. et al.

    A new test for chaos and determinism based on symbolic dynamics

    Journal of Economic Behavior & Organization

    (2010)
  • PapacharalampousG. et al.

    Hydrological time series forecasting using simple combinations: Big data testing and investigations on one-year ahead river flow predictability

    Journal of Hydrology

    (2020)
  • ParmezanA.R.S. et al.

    Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model

    Information Sciences

    (2019)
  • SagheerA. et al.

    Time series forecasting of petroleum production using deep LSTM recurrent networks

    Neurocomputing

    (2019)
  • SallesR. et al.

    Nonstationary time series transformation methods: An experimental review

    Knowledge-Based Systems

    (2019)
  • SangiorgioM. et al.

    Forecasting of noisy chaotic systems with deep neural networks

    Chaos, Solitons & Fractals

    (2021)
  • AbdulkarimS.

    Time series prediction with simple recurrent neural networks

    Bayero Journal of Pure and Applied Sciences

    (2016)
  • AkaikeH.

    Fitting autoregressive models for prediction

    Annals of the Institute of Statistical Mathematics

    (1969)
  • AmemiyaT.

    Selection of regressors

    International Economic Review

    (1980)
  • AthiyarathS. et al.

    A comparative study and analysis of time series forecasting techniques

    SN Computer Science

    (2020)
  • BenavoliA. et al.

    Should we really use post-hoc tests based on mean-ranks?

    The Journal of Machine Learning Research

    (2016)
  • BianchiF.M. et al.

    An overview and comparative analysis of recurrent neural networks for short term load forecasting

    (2017)
  • BoarettoB. et al.

    Discriminating chaotic and stochastic time series using permutation entropy and artificial neural networks

    Scientific Reports

    (2021)
  • BoxG.E. et al.
  • BukhariA.H. et al.

    Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting

    IEEE Access

    (2020)
  • CenciniM. et al.

    Chaos or noise: Difficulties of a distinction

    Physical Review E

    (2000)
  • ChatfieldC.
  • ChoK. et al.

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

    (2014)
  • ChoubinB. et al.

    Precipitation forecasting using classification and regression trees (CART) model: A comparative study of different approaches

    Environmental Earth Sciences

    (2018)
  • CroneS.

    Nn5 forecasting competition for artificial neural networks & computational intelligence

    (2008)
  • DauH.A. et al.

    The UCR time series archive

    IEEE/CAA Journal of Automatica Sinica

    (2019)
  • DemšarJ.

    Statistical comparisons of classifiers over multiple data sets

    The Journal of Machine Learning Research

    (2006)
  • DeyR. et al.

    Gate-variants of gated recurrent unit (GRU) neural networks

  • DickeyD.A. et al.

    Distribution of the estimators for autoregressive time series with a unit root

    Journal of the American Statistical Association

    (1979)
  • DivinaF. et al.

    A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings

    Energies

    (2019)
  • EckmannJ.-P. et al.

    Ergodic theory of chaos and strange attractors

    The Theory of Chaotic Attractors

    (1985)
  • FindleyD.F.

    Counterexamples to parsimony and BIC

    Annals of the Institute of Statistical Mathematics

    (1991)
  • FischerT. et al.
  • FriedmanM.

    A comparison of alternative tests of significance for the problem of M rankings

    The Annals of Mathematical Statistics

    (1940)
  • GarciaS. et al.

    An extension on” statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

    Journal of Machine Learning Research

    (2008)
  • GersF.A. et al.

    Recurrent nets that time and count

  • GersF.A. et al.

    LSTMrecurrent networks learn simple context-free and context-sensitive languages

    IEEE Transactions on Neural Networks

    (2001)
  • GersF.A. et al.

    Learning to forget: Continual prediction with LSTM

    Neural Computation

    (2000)
  • Cited by (11)

    View all citing articles on Scopus
    View full text