DSTP-RNN: A dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction
Introduction
With the development of the Internet of Things and Big Data, data collection for specific objects is carried out from a number of different feature dimensions (Le & Ge, 2019). Time series therefore are mostly presented in the form of multivariate characteristics, such as in the fields of energy consumption forecasting (Luis M. Candanedo, Veronique Feldheim, & Deramaix, 2017), financial market prediction (Moews, Herrmann, & Ibikunle, 2018; Qin, Song, Cheng, Cheng, & Cottrell, 2017), environment forecasting (Zamoramartínez, Romeu, Botellarocamora, & Pardo, 2014), heart and brain signal analysis (Fernandez-Fraga, Aceves-Fernandez, Pedraza-Ortega, & Ramos-Arreguin, 2018), etc. Also, the current single-step prediction or short-term prediction of time series has limited application prospects in many fields, and the application of long-term prediction in various fields is more meaningful. For example, compared to predicting the value of the next moment, it makes more sense for expert and intelligent systems to predict weather or energy changes over a period of time.
However, long-term prediction of multivariate time series is still a challenging problem, which is mainly reflected in the feature representation and selection mechanism of spatio-temporal relationships between different series. Specifically, the three major challenges are presented in Fig. 1(c), which include representing and learning (1) the spatial correlations between different attributes at the same time, (2) the spatio-temporal relationships between different attributes at different times, and (3) the temporal relationships between different series (Monidipa & Ghosh, 2019; Qin, et al., 2017; Yunzhe, et al., 2018).
Although the time series prediction has attracted a wide attention in the research community, typical methods, e.g., autoregressive integrated moving average models (ARIMA) (Amini, Kargarian, & Karabasoglu, 2016), kernel methods (Jie & Zio, 2016), and RNN methods (Chen, Xin, She, & Min, 2017), mainly focus on solving one aspect of the dynamic spatio-temporal relationships. Hence, these methods are impossible to achieve accurate and robust long-term prediction of multivariate time series. Moreover, attention-based RNNs are used to effectively represent and learn temporal-spatial correlations in time series, but these methods are only successfully applied in single-step prediction and short-term prediction (Qin et al., 2017; Yuxuan, Songyu, Junbo, Xiuwen, & Yu, 2018). The first motivation of this paper, therefore, is to develop an excellent intelligent model for representing and learning the spatio-temporal relationship in time series, which can achieve accurate long-term prediction of time series and provide a reliable expert and intelligent system in the aforementioned fields.
The proposed intelligent models are inspired by the DSTP model of human attention structures and the target and non-target information mechanism of human neuron signals (Ronald, Marco, & Carola, 2010). On the one hand, in the first stage of the DSTP model, the response in the first phase leads to violent but decentralized response, while it will produce stationary and concentrated response in the second phase (Ronald et al., 2010). When we design artificial neural network structures, the attention mechanism at different stages can be embodied in the two stages, i.e., the spatial attention and the temporal attention, and the attention mechanism at different phases can be reflected in the multiple filtering of spatial correlations in the spatial attention stage. Hence, we propose the DSTP-RNN model with dual-stage two-phase attention structure to learn more robust spatio-temporal relationships in time series. On the other hand, the stimulation of neuron signals shows that both target signals and non-target signals have a certain effect, and this is because perceptual filtering is imperfect (Ronald et al., 2010). In fact, the supervised dataset reconstruction of target series is the key to the application of traditional machine learning methods in time series prediction, which shows the importance of past information of target series. Meanwhile, the RNN method, which forecasts future values based on past values, also shows that the time dependency depends on its own past information. Therefore, we further develop the DSTP-RNN-Ⅱ model that pays more attention to the spatio-temporal relationships between target series and exogenous series.
Furthermore, the attention mechanism of human vision is a multi-layer neuron structure, which is widely used in the natural language processing (Vaswani et al., 2017) and computer vision (Li, Zeng, Shan, & Chen, 2018). Naturally, we study the deep attention mechanism in the spatial attention. Considering the development of the attention mechanism in deep learning, the second motivation of this paper is to study some novel attention structures suitable for representing and learning the spatio-temporal relationships in time series. Consequently, we study the hierarchical attention mechanism (DSTP-RNN), the hierarchical and parallel hybrid attention mechanism (DSTP-RNN-Ⅱ) and the deep attention mechanism (DeepAttn).
To achieve these two motivations, we enhance the focus on spatial correlations through the DSTP-based model, and enhance the attention to temporal relationships through the embedding of target information, thus capturing more accurate spatio-temporal relationships in time series prediction. The contributions of our work are four-fold:
- •
DSTP-RNN. Inspired by the DSTP model of human attention (Ronald et al., 2010), we propose DSTP-RNN to represent and learn robust spatio-temporal relationships in time series. Two phases mean two consecutive attention modules with or without target series to yield spatial correlations, and these two phases differ with respect to their susceptibility to interference. Dual stages refer to the spatial attention mechanism for the original series and the temporal attention mechanism for the hidden state in the last spatial attention.
- •
Target and no-target information mechanism. Enlightened by the target and non-target information mechanism of human neuron signals (Ronald et al., 2010), we develop DSTP-RNN-Ⅱ to extract the spatial correlations between target series and exogenous series based on a parallel spatial attention module. Furthermore, we are more concerned with past information of target series to better learn long-term dependency. Specifically, we embed past information of target series corresponding to exogenous series in the last-phase spatial attention module.
- •
Deep spatial attention. Due to the multi-layer structure of human neural networks (Fukushima & Miyake, 1982), we further study the effectiveness of deep spatial attention mechanism on spatio-temporal relationships and give the interpretation experiments. Overall, the present paper systematically provides a reference for expert and intelligent systems in time series prediction based on attention-based RNN methods, since seven attention-based RNN models, including three new proposed models, are compared.
- •
Application in many fields. Experimental results demonstrate that the present work can be successfully used to develop expert and intelligent systems for a wide range of applications, with state-of-the-art performances superior to nine baseline methods on four datasets in the fields of energy, finance, environment and medicine, respectively.
Section snippets
Related work
Our work is mainly related with two lines of research: time series prediction methods and attention-based neural network structures.
Notation
Given n (n ≥ 1) exogenous series and one target series, we use to represent k-th exogenous series within the length of window size T, and use X = (x1,x2,…, xT)T ∈ Rn × T (xk = xk) to represent all exogenous variables within window size T.
As for the notation related to target series, we employ Y = (y1,y2,…, yT)T ∈ RT to represent the target series within window size T, employ Z = (z1,z2,…, zT)T ∈ R(n + 1) × T to represent the set of the output of the first phase attention
Models
Fig. 1(a) and (b) present the overall framework of the proposed DSTP-RNN and DSTP-RNN-Ⅱ, respectively. Dual stages refer to the learning of spatial correlations in the first stage and the learning of temporal relationships in the second stage, which are named spatial attention (red boxes in Fig. 1) and temporal attention (blue boxes in Fig. 1), respectively. The spatial attention module consists of two-phase structures. The first phase produces violent but decentralized response weight from the
Experiments
We implement all proposed models and neural network baseline methods in PyTorch framework. In this section, we first describe four datasets from different fields and give an introduction of baseline methods. Then, we introduce the hyperparameter setting and model evaluation metrics. Finally, extensive experiments have proved the effectiveness of our models. In particular, we compare the effects of each module on experimental results, and we also provide an interpretation of the attention-based
Conclusion and future work
In this paper, we propose two novel attention-based RNN for long-term and multivariate time series prediction, i.e., DSTP-RNN and DSTP-RNN-Ⅱ. In general, our models enhance the attention mechanism of both spatial correlations and temporal relationships to better learn spatio-temporal relationships, and thus outperform the state-of-the-art methods in four datasets and different time step prediction. Our interpretation of the attention-based model provide a developed idea for further
CRediT authorship contribution statement
Yeqi Liu: Conceptualization, Methodology, Writing - original draft. Chuanyang Gong: Data curation, Writing - review & editing. Ling Yang: Data curation, Writing - review & editing. Yingyi Chen: Funding acquisition, Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work is supported by the National Key Research and Development Program of China “Next generation precision aquaculture: R&D on intelligent measurement, control and equipment technologies” (no. 2017YFE0122100), and the Science and Technology Program of Beijing “Research and Demonstration of technologies equipment capable of intelligent control for large-scale healthy cultivation of freshwater fish” (no. Z171100001517016).
References (54)
- et al.
ARIMA-based decoupled time series forecasting of electric vehicle charging demand for stochastic power system operation
Electric Power Systems Research
(2016) - et al.
A hybrid time series prediction model based on recurrent neural network and double joint linear-nonlinear extreme learning network for prediction of carbon efficiency in iron ore sintering process
Neurocomputing
(2017) - et al.
Occlusion aware facial expression recognition using CNN with attention mechanism
IEEE Transactions on Image Processing
(2018) - et al.
EA-LSTM: Evolutionary attention-based LSTM for time series prediction
Knowledge-Based Systems
(2019) - et al.
Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction
Computers and Electronics in Agriculture
(2019) - et al.
Data driven prediction models of energy use of appliances in a low-energy house
Energy and Buildings
(2017) - et al.
On-line learning of indoor temperature forecasting models towards energy efficiency
Energy & Buildings
(2014) - et al.
Neural Machine Translation with Deep Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2018) - Allen-Zhu, Z., Li, Y., & Song, Z. (2018). A convergence theory for deep learning via over-parameterization....
- et al.
Understanding deep neural networks with rectified linear units
Neural machine translation by jointly learning to align and translate
Learning phrase representations using RNN encoder-decoder for statistical machine translation
Empirical evaluation of gated recurrent neural networks on sequence modeling
Period-aware content attention rnns for time series forecasting with missing values
Neurocomputing
Position-based content attention for time series forecasting with sequence-to-sequence RNNs
RETAIN: Interpretable predictive model in healthcare using reverse time attention mechanism
In Advances in Neural Information Processing Systems
Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition
Systems Man & Cybernetics IEEE Transactions on, SMC-13
Time-series modelling and forecasting: modelling of rainfall prediction using ARIMA model
International Journal of Society Systems Science
Financial time series prediction using least squares support vector machines within the evidence framework
IEEE Transactions on Neural Networks, 12
Long short-term memory
Neural Computation
Laplacian echo state network for multivariate time series prediction
IEEE Transactions on Neural Networks and Learning Systems
Sequence to sequence learning with neural networks
Cited by (245)
A novel hybrid deep fuzzy model based on gradient descent algorithm with application to time series forecasting
2024, Expert Systems with ApplicationsA novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease
2024, Computers in Biology and MedicineForecasting multistep daily stock prices for long-term investment decisions: A study of deep learning models on global indices
2024, Engineering Applications of Artificial IntelligenceRapid prediction of regenerator performance for regenerative cryogenics cryocooler based on convolutional neural network
2024, International Journal of Refrigeration