DSTP-RNN: A dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction

https://doi.org/10.1016/j.eswa.2019.113082Get rights and content

Highlights

  • We propose DSTP-RNN and DSTP-RNN-Ⅱ for long-term time series prediction.

  • We enhance the attention to spatio-temporal relationships of time series.

  • We study the deep spatial attention mechanism and give the interpretation.

  • Our methods outperform nine baseline methods on four datasets.

Abstract

Long-term prediction of multivariate time series is still an important but challenging problem. The key to solve this problem is capturing (1) the spatial correlations at the same time, (2) the spatio-temporal relationships at different times, and (3) long-term dependency of the temporal relationships between different series. Attention-based recurrent neural networks (RNN) can effectively represent and learn the dynamic spatio-temporal relationships between exogenous series and target series, but they only perform well in one-step time prediction and short-term time prediction. In this paper, inspired by human attention mechanism including the dual-stage two-phase (DSTP) model and the influence mechanism of target information and non-target information, we propose DSTP-based RNN (DSTP-RNN) and DSTP-RNN-Ⅱ respectively for long-term time series prediction. Specifically, we first propose the DSTP-based structure to enhance the spatial correlations between exogenous series. The first phase produces violent but decentralized response weight, while the second phase leads to stationary and concentrated response weight. Then, we employ multiple attentions on target series to boost the long-term dependency. Finally, we study the performance of deep spatial attention mechanism and provide interpretation. Experimental results demonstrate that the present work can be successfully used to develop expert or intelligent systems for a wide range of applications, with state-of-the-art performances superior to nine baseline methods on four datasets in the fields of energy, finance, environment and medicine, respectively. Overall, the present work carries a significant value not merely in the domain of machine intelligence and deep learning, but also in the fields of many applications.

Introduction

With the development of the Internet of Things and Big Data, data collection for specific objects is carried out from a number of different feature dimensions (Le & Ge, 2019). Time series therefore are mostly presented in the form of multivariate characteristics, such as in the fields of energy consumption forecasting (Luis M. Candanedo, Veronique Feldheim, & Deramaix, 2017), financial market prediction (Moews, Herrmann, & Ibikunle, 2018; Qin, Song, Cheng, Cheng, & Cottrell, 2017), environment forecasting (Zamoramartínez, Romeu, Botellarocamora, & Pardo, 2014), heart and brain signal analysis (Fernandez-Fraga, Aceves-Fernandez, Pedraza-Ortega, & Ramos-Arreguin, 2018), etc. Also, the current single-step prediction or short-term prediction of time series has limited application prospects in many fields, and the application of long-term prediction in various fields is more meaningful. For example, compared to predicting the value of the next moment, it makes more sense for expert and intelligent systems to predict weather or energy changes over a period of time.

However, long-term prediction of multivariate time series is still a challenging problem, which is mainly reflected in the feature representation and selection mechanism of spatio-temporal relationships between different series. Specifically, the three major challenges are presented in Fig. 1(c), which include representing and learning (1) the spatial correlations between different attributes at the same time, (2) the spatio-temporal relationships between different attributes at different times, and (3) the temporal relationships between different series (Monidipa & Ghosh, 2019; Qin, et al., 2017; Yunzhe, et al., 2018).

Although the time series prediction has attracted a wide attention in the research community, typical methods, e.g., autoregressive integrated moving average models (ARIMA) (Amini, Kargarian, & Karabasoglu, 2016), kernel methods (Jie & Zio, 2016), and RNN methods (Chen, Xin, She, & Min, 2017), mainly focus on solving one aspect of the dynamic spatio-temporal relationships. Hence, these methods are impossible to achieve accurate and robust long-term prediction of multivariate time series. Moreover, attention-based RNNs are used to effectively represent and learn temporal-spatial correlations in time series, but these methods are only successfully applied in single-step prediction and short-term prediction (Qin et al., 2017; Yuxuan, Songyu, Junbo, Xiuwen, & Yu, 2018). The first motivation of this paper, therefore, is to develop an excellent intelligent model for representing and learning the spatio-temporal relationship in time series, which can achieve accurate long-term prediction of time series and provide a reliable expert and intelligent system in the aforementioned fields.

The proposed intelligent models are inspired by the DSTP model of human attention structures and the target and non-target information mechanism of human neuron signals (Ronald, Marco, & Carola, 2010). On the one hand, in the first stage of the DSTP model, the response in the first phase leads to violent but decentralized response, while it will produce stationary and concentrated response in the second phase (Ronald et al., 2010). When we design artificial neural network structures, the attention mechanism at different stages can be embodied in the two stages, i.e., the spatial attention and the temporal attention, and the attention mechanism at different phases can be reflected in the multiple filtering of spatial correlations in the spatial attention stage. Hence, we propose the DSTP-RNN model with dual-stage two-phase attention structure to learn more robust spatio-temporal relationships in time series. On the other hand, the stimulation of neuron signals shows that both target signals and non-target signals have a certain effect, and this is because perceptual filtering is imperfect (Ronald et al., 2010). In fact, the supervised dataset reconstruction of target series is the key to the application of traditional machine learning methods in time series prediction, which shows the importance of past information of target series. Meanwhile, the RNN method, which forecasts future values based on past values, also shows that the time dependency depends on its own past information. Therefore, we further develop the DSTP-RNN-Ⅱ model that pays more attention to the spatio-temporal relationships between target series and exogenous series.

Furthermore, the attention mechanism of human vision is a multi-layer neuron structure, which is widely used in the natural language processing (Vaswani et al., 2017) and computer vision (Li, Zeng, Shan, & Chen, 2018). Naturally, we study the deep attention mechanism in the spatial attention. Considering the development of the attention mechanism in deep learning, the second motivation of this paper is to study some novel attention structures suitable for representing and learning the spatio-temporal relationships in time series. Consequently, we study the hierarchical attention mechanism (DSTP-RNN), the hierarchical and parallel hybrid attention mechanism (DSTP-RNN-Ⅱ) and the deep attention mechanism (DeepAttn).

To achieve these two motivations, we enhance the focus on spatial correlations through the DSTP-based model, and enhance the attention to temporal relationships through the embedding of target information, thus capturing more accurate spatio-temporal relationships in time series prediction. The contributions of our work are four-fold:

  • DSTP-RNN. Inspired by the DSTP model of human attention (Ronald et al., 2010), we propose DSTP-RNN to represent and learn robust spatio-temporal relationships in time series. Two phases mean two consecutive attention modules with or without target series to yield spatial correlations, and these two phases differ with respect to their susceptibility to interference. Dual stages refer to the spatial attention mechanism for the original series and the temporal attention mechanism for the hidden state in the last spatial attention.

  • Target and no-target information mechanism. Enlightened by the target and non-target information mechanism of human neuron signals (Ronald et al., 2010), we develop DSTP-RNN-Ⅱ to extract the spatial correlations between target series and exogenous series based on a parallel spatial attention module. Furthermore, we are more concerned with past information of target series to better learn long-term dependency. Specifically, we embed past information of target series corresponding to exogenous series in the last-phase spatial attention module.

  • Deep spatial attention. Due to the multi-layer structure of human neural networks (Fukushima & Miyake, 1982), we further study the effectiveness of deep spatial attention mechanism on spatio-temporal relationships and give the interpretation experiments. Overall, the present paper systematically provides a reference for expert and intelligent systems in time series prediction based on attention-based RNN methods, since seven attention-based RNN models, including three new proposed models, are compared.

  • Application in many fields. Experimental results demonstrate that the present work can be successfully used to develop expert and intelligent systems for a wide range of applications, with state-of-the-art performances superior to nine baseline methods on four datasets in the fields of energy, finance, environment and medicine, respectively.

Section snippets

Related work

Our work is mainly related with two lines of research: time series prediction methods and attention-based neural network structures.

Notation

Given n (n ≥ 1) exogenous series and one target series, we use xk=(x1k,x2k,,xTk)TRT to represent k-th exogenous series within the length of window size T, and use X = (x1,x2,…, xT)TRn × T (xk = xk) to represent all exogenous variables within window size T.

As for the notation related to target series, we employ Y = (y1,y2,…, yT)TRT to represent the target series within window size T, employ Z = (z1,z2,…, zT)TR(n + 1) × T to represent the set of the output of the first phase attention (

Models

Fig. 1(a) and (b) present the overall framework of the proposed DSTP-RNN and DSTP-RNN-Ⅱ, respectively. Dual stages refer to the learning of spatial correlations in the first stage and the learning of temporal relationships in the second stage, which are named spatial attention (red boxes in Fig. 1) and temporal attention (blue boxes in Fig. 1), respectively. The spatial attention module consists of two-phase structures. The first phase produces violent but decentralized response weight from the

Experiments

We implement all proposed models and neural network baseline methods in PyTorch framework. In this section, we first describe four datasets from different fields and give an introduction of baseline methods. Then, we introduce the hyperparameter setting and model evaluation metrics. Finally, extensive experiments have proved the effectiveness of our models. In particular, we compare the effects of each module on experimental results, and we also provide an interpretation of the attention-based

Conclusion and future work

In this paper, we propose two novel attention-based RNN for long-term and multivariate time series prediction, i.e., DSTP-RNN and DSTP-RNN-Ⅱ. In general, our models enhance the attention mechanism of both spatial correlations and temporal relationships to better learn spatio-temporal relationships, and thus outperform the state-of-the-art methods in four datasets and different time step prediction. Our interpretation of the attention-based model provide a developed idea for further

CRediT authorship contribution statement

Yeqi Liu: Conceptualization, Methodology, Writing - original draft. Chuanyang Gong: Data curation, Writing - review & editing. Ling Yang: Data curation, Writing - review & editing. Yingyi Chen: Funding acquisition, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by the National Key Research and Development Program of China “Next generation precision aquaculture: R&D on intelligent measurement, control and equipment technologies” (no. 2017YFE0122100), and the Science and Technology Program of Beijing “Research and Demonstration of technologies equipment capable of intelligent control for large-scale healthy cultivation of freshwater fish” (no. Z171100001517016).

References (54)

  • D. Bahdanau et al.

    Neural machine translation by jointly learning to align and translate

  • G. Bontempi et al.
    (2013)
  • Chang, Y.Y., Sun, F.Y., Wu, Y.H., & Lin, S.D. (2018). A memory-network based solution for multivariate time-series...
  • K. Cho et al.

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

  • Cho, K., Van Merri¨enboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation:...
  • J. Chung et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

  • Y.G. Cinar et al.

    Period-aware content attention rnns for time series forecasting with missing values

    Neurocomputing

    (2018)
  • Y.G. Cinar et al.

    Position-based content attention for time series forecasting with sequence-to-sequence RNNs

    (2017)
  • E. Choi et al.

    RETAIN: Interpretable predictive model in healthcare using reverse time attention mechanism

    In Advances in Neural Information Processing Systems

    (2016)
  • K. Fukushima et al.

    Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition

    Systems Man & Cybernetics IEEE Transactions on, SMC-13

    (1982)
  • Gangi, M.A.D., & Federico, M. (2018). Deep neural machine translation with weakly-recurrent units....
  • A. Geetha et al.

    Time-series modelling and forecasting: modelling of rainfall prediction using ARIMA model

    International Journal of Society Systems Science

    (2016)
  • T. Gestel et al.

    Financial time series prediction using least squares support vector machines within the evidence framework

    IEEE Transactions on Neural Networks, 12

    (2001)
  • A. Graves

    Long short-term memory

    Neural Computation

    (1997)
  • Guo, T., & Lin, T. (2018). Multi-variable LSTM neural network for autoregressive exogenous model....
  • M. Han et al.

    Laplacian echo state network for multivariate time series prediction

    IEEE Transactions on Neural Networks and Learning Systems

    (2018)
  • S. Ilya et al.

    Sequence to sequence learning with neural networks

  • Cited by (245)

    View all citing articles on Scopus
    View full text