DSTP-RNN: A dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction

doi:10.1016/j.eswa.2019.113082

Expert Systems with Applications

Volume 143, 1 April 2020, 113082

https://doi.org/10.1016/j.eswa.2019.113082 Get rights and content

Highlights

•
We propose DSTP-RNN and DSTP-RNN-Ⅱ for long-term time series prediction.
•
We enhance the attention to spatio-temporal relationships of time series.
•
We study the deep spatial attention mechanism and give the interpretation.
•
Our methods outperform nine baseline methods on four datasets.

Abstract

Long-term prediction of multivariate time series is still an important but challenging problem. The key to solve this problem is capturing (1) the spatial correlations at the same time, (2) the spatio-temporal relationships at different times, and (3) long-term dependency of the temporal relationships between different series. Attention-based recurrent neural networks (RNN) can effectively represent and learn the dynamic spatio-temporal relationships between exogenous series and target series, but they only perform well in one-step time prediction and short-term time prediction. In this paper, inspired by human attention mechanism including the dual-stage two-phase (DSTP) model and the influence mechanism of target information and non-target information, we propose DSTP-based RNN (DSTP-RNN) and DSTP-RNN-Ⅱ respectively for long-term time series prediction. Specifically, we first propose the DSTP-based structure to enhance the spatial correlations between exogenous series. The first phase produces violent but decentralized response weight, while the second phase leads to stationary and concentrated response weight. Then, we employ multiple attentions on target series to boost the long-term dependency. Finally, we study the performance of deep spatial attention mechanism and provide interpretation. Experimental results demonstrate that the present work can be successfully used to develop expert or intelligent systems for a wide range of applications, with state-of-the-art performances superior to nine baseline methods on four datasets in the fields of energy, finance, environment and medicine, respectively. Overall, the present work carries a significant value not merely in the domain of machine intelligence and deep learning, but also in the fields of many applications.

Introduction

With the development of the Internet of Things and Big Data, data collection for specific objects is carried out from a number of different feature dimensions (Le & Ge, 2019). Time series therefore are mostly presented in the form of multivariate characteristics, such as in the fields of energy consumption forecasting (Luis M. Candanedo, Veronique Feldheim, & Deramaix, 2017), financial market prediction (Moews, Herrmann, & Ibikunle, 2018; Qin, Song, Cheng, Cheng, & Cottrell, 2017), environment forecasting (Zamoramartínez, Romeu, Botellarocamora, & Pardo, 2014), heart and brain signal analysis (Fernandez-Fraga, Aceves-Fernandez, Pedraza-Ortega, & Ramos-Arreguin, 2018), etc. Also, the current single-step prediction or short-term prediction of time series has limited application prospects in many fields, and the application of long-term prediction in various fields is more meaningful. For example, compared to predicting the value of the next moment, it makes more sense for expert and intelligent systems to predict weather or energy changes over a period of time.

However, long-term prediction of multivariate time series is still a challenging problem, which is mainly reflected in the feature representation and selection mechanism of spatio-temporal relationships between different series. Specifically, the three major challenges are presented in Fig. 1(c), which include representing and learning (1) the spatial correlations between different attributes at the same time, (2) the spatio-temporal relationships between different attributes at different times, and (3) the temporal relationships between different series (Monidipa & Ghosh, 2019; Qin, et al., 2017; Yunzhe, et al., 2018).

Although the time series prediction has attracted a wide attention in the research community, typical methods, e.g., autoregressive integrated moving average models (ARIMA) (Amini, Kargarian, & Karabasoglu, 2016), kernel methods (Jie & Zio, 2016), and RNN methods (Chen, Xin, She, & Min, 2017), mainly focus on solving one aspect of the dynamic spatio-temporal relationships. Hence, these methods are impossible to achieve accurate and robust long-term prediction of multivariate time series. Moreover, attention-based RNNs are used to effectively represent and learn temporal-spatial correlations in time series, but these methods are only successfully applied in single-step prediction and short-term prediction (Qin et al., 2017; Yuxuan, Songyu, Junbo, Xiuwen, & Yu, 2018). The first motivation of this paper, therefore, is to develop an excellent intelligent model for representing and learning the spatio-temporal relationship in time series, which can achieve accurate long-term prediction of time series and provide a reliable expert and intelligent system in the aforementioned fields.

The proposed intelligent models are inspired by the DSTP model of human attention structures and the target and non-target information mechanism of human neuron signals (Ronald, Marco, & Carola, 2010). On the one hand, in the first stage of the DSTP model, the response in the first phase leads to violent but decentralized response, while it will produce stationary and concentrated response in the second phase (Ronald et al., 2010). When we design artificial neural network structures, the attention mechanism at different stages can be embodied in the two stages, i.e., the spatial attention and the temporal attention, and the attention mechanism at different phases can be reflected in the multiple filtering of spatial correlations in the spatial attention stage. Hence, we propose the DSTP-RNN model with dual-stage two-phase attention structure to learn more robust spatio-temporal relationships in time series. On the other hand, the stimulation of neuron signals shows that both target signals and non-target signals have a certain effect, and this is because perceptual filtering is imperfect (Ronald et al., 2010). In fact, the supervised dataset reconstruction of target series is the key to the application of traditional machine learning methods in time series prediction, which shows the importance of past information of target series. Meanwhile, the RNN method, which forecasts future values based on past values, also shows that the time dependency depends on its own past information. Therefore, we further develop the DSTP-RNN-Ⅱ model that pays more attention to the spatio-temporal relationships between target series and exogenous series.

Furthermore, the attention mechanism of human vision is a multi-layer neuron structure, which is widely used in the natural language processing (Vaswani et al., 2017) and computer vision (Li, Zeng, Shan, & Chen, 2018). Naturally, we study the deep attention mechanism in the spatial attention. Considering the development of the attention mechanism in deep learning, the second motivation of this paper is to study some novel attention structures suitable for representing and learning the spatio-temporal relationships in time series. Consequently, we study the hierarchical attention mechanism (DSTP-RNN), the hierarchical and parallel hybrid attention mechanism (DSTP-RNN-Ⅱ) and the deep attention mechanism (DeepAttn).

To achieve these two motivations, we enhance the focus on spatial correlations through the DSTP-based model, and enhance the attention to temporal relationships through the embedding of target information, thus capturing more accurate spatio-temporal relationships in time series prediction. The contributions of our work are four-fold:

•
DSTP-RNN. Inspired by the DSTP model of human attention (Ronald et al., 2010), we propose DSTP-RNN to represent and learn robust spatio-temporal relationships in time series. Two phases mean two consecutive attention modules with or without target series to yield spatial correlations, and these two phases differ with respect to their susceptibility to interference. Dual stages refer to the spatial attention mechanism for the original series and the temporal attention mechanism for the hidden state in the last spatial attention.
•
Target and no-target information mechanism. Enlightened by the target and non-target information mechanism of human neuron signals (Ronald et al., 2010), we develop DSTP-RNN-Ⅱ to extract the spatial correlations between target series and exogenous series based on a parallel spatial attention module. Furthermore, we are more concerned with past information of target series to better learn long-term dependency. Specifically, we embed past information of target series corresponding to exogenous series in the last-phase spatial attention module.
•
Deep spatial attention. Due to the multi-layer structure of human neural networks (Fukushima & Miyake, 1982), we further study the effectiveness of deep spatial attention mechanism on spatio-temporal relationships and give the interpretation experiments. Overall, the present paper systematically provides a reference for expert and intelligent systems in time series prediction based on attention-based RNN methods, since seven attention-based RNN models, including three new proposed models, are compared.
•
Application in many fields. Experimental results demonstrate that the present work can be successfully used to develop expert and intelligent systems for a wide range of applications, with state-of-the-art performances superior to nine baseline methods on four datasets in the fields of energy, finance, environment and medicine, respectively.

Section snippets

Related work

Our work is mainly related with two lines of research: time series prediction methods and attention-based neural network structures.

Notation

Given n (n ≥ 1) exogenous series and one target series, we use $x^{k} = {(x_{1}^{k}, x_{2}^{k}, \dots, x_{T}^{k})}^{T} \in R^{T}$ to represent k-th exogenous series within the length of window size T, and use X = (x₁,x₂,…, x_T)^T ∈ R^n × T (x_k = x^k) to represent all exogenous variables within window size T.

As for the notation related to target series, we employ Y = (y₁,y₂,…, y_T)^T ∈ R^T to represent the target series within window size T, employ Z = (z₁,z₂,…, z_T)^T ∈ R^{(n + 1) × T} to represent the set of the output of the first phase attention $($

Models

Fig. 1(a) and (b) present the overall framework of the proposed DSTP-RNN and DSTP-RNN-Ⅱ, respectively. Dual stages refer to the learning of spatial correlations in the first stage and the learning of temporal relationships in the second stage, which are named spatial attention (red boxes in Fig. 1) and temporal attention (blue boxes in Fig. 1), respectively. The spatial attention module consists of two-phase structures. The first phase produces violent but decentralized response weight from the

Experiments

We implement all proposed models and neural network baseline methods in PyTorch framework. In this section, we first describe four datasets from different fields and give an introduction of baseline methods. Then, we introduce the hyperparameter setting and model evaluation metrics. Finally, extensive experiments have proved the effectiveness of our models. In particular, we compare the effects of each module on experimental results, and we also provide an interpretation of the attention-based

Conclusion and future work

In this paper, we propose two novel attention-based RNN for long-term and multivariate time series prediction, i.e., DSTP-RNN and DSTP-RNN-Ⅱ. In general, our models enhance the attention mechanism of both spatial correlations and temporal relationships to better learn spatio-temporal relationships, and thus outperform the state-of-the-art methods in four datasets and different time step prediction. Our interpretation of the attention-based model provide a developed idea for further

CRediT authorship contribution statement

Yeqi Liu: Conceptualization, Methodology, Writing - original draft. Chuanyang Gong: Data curation, Writing - review & editing. Ling Yang: Data curation, Writing - review & editing. Yingyi Chen: Funding acquisition, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by the National Key Research and Development Program of China “Next generation precision aquaculture: R&D on intelligent measurement, control and equipment technologies” (no. 2017YFE0122100), and the Science and Technology Program of Beijing “Research and Demonstration of technologies equipment capable of intelligent control for large-scale healthy cultivation of freshwater fish” (no. Z171100001517016).

References (54)

M.H. Amini et al.
ARIMA-based decoupled time series forecasting of electric vehicle charging demand for stochastic power system operation
Electric Power Systems Research
(2016)
X. Chen et al.
A hybrid time series prediction model based on recurrent neural network and double joint linear-nonlinear extreme learning network for prediction of carbon efficiency in iron ore sintering process
Neurocomputing
(2017)
Y. Li et al.
Occlusion aware facial expression recognition using CNN with attention mechanism
IEEE Transactions on Image Processing
(2018)
Y. Li et al.
EA-LSTM: Evolutionary attention-based LSTM for time series prediction
Knowledge-Based Systems
(2019)
Y. Liu et al.
Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction
Computers and Electronics in Agriculture
(2019)
Luis M. Candanedo et al.
Data driven prediction models of energy use of appliances in a low-energy house
Energy and Buildings
(2017)
F. Zamoramartínez et al.
On-line learning of indoor temperature forecasting models towards energy efficiency
Energy & Buildings
(2014)
B. Zhang et al.
Neural Machine Translation with Deep Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2018)
Allen-Zhu, Z., Li, Y., & Song, Z. (2018). A convergence theory for deep learning via over-parameterization....
R. Arora et al.
Understanding deep neural networks with rectified linear units

D. Bahdanau et al.

Neural machine translation by jointly learning to align and translate

G. Bontempi et al.

(2013)

Chang, Y.Y., Sun, F.Y., Wu, Y.H., & Lin, S.D. (2018). A memory-network based solution for multivariate time-series...

K. Cho et al.

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Cho, K., Van Merri¨enboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation:...

J. Chung et al.

Empirical evaluation of gated recurrent neural networks on sequence modeling

Y.G. Cinar et al.

Period-aware content attention rnns for time series forecasting with missing values

Neurocomputing

(2018)

Y.G. Cinar et al.

Position-based content attention for time series forecasting with sequence-to-sequence RNNs

(2017)

E. Choi et al.

RETAIN: Interpretable predictive model in healthcare using reverse time attention mechanism

In Advances in Neural Information Processing Systems

(2016)

K. Fukushima et al.

Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition

Systems Man & Cybernetics IEEE Transactions on, SMC-13

(1982)

Gangi, M.A.D., & Federico, M. (2018). Deep neural machine translation with weakly-recurrent units....

A. Geetha et al.

Time-series modelling and forecasting: modelling of rainfall prediction using ARIMA model

International Journal of Society Systems Science

(2016)

T. Gestel et al.

Financial time series prediction using least squares support vector machines within the evidence framework

IEEE Transactions on Neural Networks, 12

(2001)

A. Graves

Long short-term memory

Neural Computation

(1997)

Guo, T., & Lin, T. (2018). Multi-variable LSTM neural network for autoregressive exogenous model....

M. Han et al.

Laplacian echo state network for multivariate time series prediction

IEEE Transactions on Neural Networks and Learning Systems

(2018)

S. Ilya et al.

Sequence to sequence learning with neural networks

Cited by (245)

A novel hybrid deep fuzzy model based on gradient descent algorithm with application to time series forecasting
2024, Expert Systems with Applications
Deep fuzzy systems are widely used in time series forecasting tasks due to their excellent nonlinear transformation capabilities and interpretability. However, traditional optimization methods and fixed network structures cannot make the deep fuzzy system obtain the expected prediction accuracy and generalization ability. Therefore, a novel hybrid deep fuzzy model (HDFM) is proposed in this paper. Firstly, two types fuzzy modules, namely the type-1 TSK fuzzy module (T1TSKFM) and the interval type-2 TSK fuzzy module (IT2TSKFM), are respectively presented and designed in detail. And, the gradient expressions of the fuzzy parameters, including the antecedent and consequent parameters, are also derived in detail. Secondly, in order to optimize the fuzzy parameters and to further accelerate the module convergence, a novel parameter optimization strategy is presented, combining the gradient descent with the Regularization, the DropRule and the AdaBound algorithms. Thirdly, a new stacked hybrid deep fuzzy architecture is proposed, which can be automatically trained and constructed using the designed T1TSKFM and the IT2TSKFM. Then, the detailed data-driven learning and updating strategy are given in step by step way. In addition, both the layered structure interpretability and the fuzzy rule interpretability are respectively analyzed. This can guarantee that the proposed model not only has the better forecasting accuracy, but also has the higher interpretability. Finally, in order to verify the effectiveness of the proposed method, several comparative experiments are given. Experimental results show that the forecasting performance of the proposed model outperforms the other comparisons, such as the DIRM-DFM, the IT2DIRM-DFM, the DCFS, and the ANFIS method. At the same time, the proposed model has better interpretability and more flexible construction property.
Multistage spatio-temporal attention network based on NODE for short-term PV power forecasting
2024, Energy
Photovoltaic (PV) power has attracted widespread attention from many countries around the world due to its clean and renewable characteristics. To ensure the stable operation of the power system, accurate PV power forecasting has become a mandatory and challenging task. Currently, deep learning methods have become a vital approach in the field of PV power forecasting. In this work, a multistage attention neural network based on neural ordinary differential equation (MANODE) is proposed to address the main limitations of previous deep learning methods applied to PV power forecasting. Based on the neural ordinary differential equation (NODE), MANODE optimizes the long short-term memory network (LSTM) and temporal convolutional neural network (TCN), and combines the attention mechanism to achieve fine-grained spatio-temporal information extraction of PV series. In addition, the proposed MANODE model is applied to three different PV series collected from the Alice Springs meteorological station. Compared to previous state-of-the-art methods, the proposed method reduces the PV power forecasting error by at least 12.05%, 13.15%, and 9.71% on three different PV datasets, in terms of mean absolute error metric. The average errors of the MANODE method in four-hour-ahead PV power forecasting on the three datasets are 0.321, 0.350, and 0.567.
A novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease
2024, Computers in Biology and Medicine
Cardiovascular disease (CVD) remains a leading cause of death globally, presenting significant challenges in early detection and treatment. The complexity of CVD arises from its multifaceted nature, influenced by a combination of genetic, environmental, and lifestyle factors. Traditional diagnostic approaches often struggle to effectively integrate and interpret the heterogeneous data associated with CVD. Addressing this challenge, we introduce a novel Attention-Based Cross-Modal (ABCM) transfer learning framework. This framework innovatively merges diverse data types, including clinical records, medical imagery, and genetic information, through an attention-driven mechanism. This mechanism adeptly identifies and focuses on the most pertinent attributes from each data source, thereby enhancing the model’s ability to discern intricate interrelationships among various data types. Our extensive testing and validation demonstrate that the ABCM framework significantly surpasses traditional single-source models and other advanced multi-source methods in predicting CVD. Specifically, our approach achieves an accuracy of 93.5%, precision of 92.0%, recall of 94.5%, and an impressive area under the curve (AUC) of 97.2%. These results not only underscore the superior predictive capability of our model but also highlight its potential in offering more accurate and early detection of CVD. The integration of cross-modal data through attention-based mechanisms provides a deeper understanding of the disease, paving the way for more informed clinical decision-making and personalized patient care.
Forecasting multistep daily stock prices for long-term investment decisions: A study of deep learning models on global indices
2024, Engineering Applications of Artificial Intelligence
Deep machine learning algorithms play an important role in facilitating the development of predictive models for the stock market. However, most studies focus on predicting next-day stock prices or movements, limiting the usability of the predictive model for investors. This study extensively explores the ability of deep learning models to predict out-of-sample the daily prices of global stock indices over a long term, up to a year. The performance of six models, including Deep Neural Network (DNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN), are compared using Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE). The models predict the long-term daily prices of five global stock indices, namely the Nifty, the Dow Jones Industrial Average (DJIA), the DAX performance index (DAX), the Nikkei 225 (NI225), and the Shanghai Stock Exchange composite Index (SSE). The results confirm the superiority of LSTM for predicting long-term daily prices. The Bi-LSTM does not improve the result of LSTM but performs better than other algorithms. CNN overfits the training data and poorly forecasts the long-term stock prices of global indices on the testing data. This research demonstrates the potential of deep learning models for long-term stock price forecasting, offering valuable insights for investors. Additionally, the patterns of predicted daily prices can be helpful in building trading and risk management decision systems.
Wind power generation prediction during the COVID-19 epidemic based on novel hybrid deep learning techniques
2024, Renewable Energy
Wind power generation (WPG) has been expanding rapidly because wind energy is clean, sustainable, and environmentally friendly. Accurate forecasting of WPG is particularly important for maintaining a functioning electricity system in a secure and steady manner. However, WPG is affected by many factors, such as weather, and the accurate prediction of WPG is complex and difficult, especially under special circumstances (e.g., during epidemics, war, or severe weather). To accurately predict WPG under special conditions, a novel hybrid deep learning approach (MLP-LSTM) combined with the long short-term memory (LSTM) network and the multilayer perceptron (MLP) network with fully connected layers is proposed in this paper. The MLP-LSTM method exhibited excellent predictive performance in forecasting WPG. Specifically, considering the external environment, we investigated the influence of factors by forecasting the WPG difference between normal and special cases with statistical regression modeling. The identified determinant information regarding climate factors and epidemic response indices was gathered and matched to the WPG difference and input into the MLP models. The prediction result of the MLP model was further utilized to correct the prediction values of the LSTM network for predicting WPG to obtain precise and reliable prediction values. To further assess the predictive performance of the MLP-LSTM method, WPG data from seven Nordic countries were used in the prediction models. The outcomes demonstrated that the MLP-LSTM method had high accuracy and robustness in predicting WPG, and it outperformed benchmark methods. Moreover, the combined method's prediction error was much lower than that of the single method.
Rapid prediction of regenerator performance for regenerative cryogenics cryocooler based on convolutional neural network
2024, International Journal of Refrigeration
The regenerator is the core component of the regenerative cryogenic refrigerator, for its structure sizes, operating parameters and phase characteristics at the cold and hot ends co-determine the power and efficiency of the refrigerator, and the design parameters of other coupled components. Efficiently predicting the regenerator performance can reduce the design period of cryogenic refrigerators. Addressing the long computational time constraints in the traditional numerical simulation methods, a novel approach based on a one-dimensional convolutional neural network (1D-CNN) was proposed. Initially, a program capable of multi-threading and automatically running the specialized regenerator calculation software REGEN 3.3 was developed. The performance of the regenerators with various parameter combinations at the cold end temperature of 60–120 K were calculated and 181,440 pieces of data were obtained. Subsequently, the architecture and hyperparameters of the model were determined. The trained model exhibits an average relative error of 3.83% for predicting regenerator power, 0.13% for predicting pressure ratio at the hot end, and 1.55% for predicting the coefficient of performance (COP). The model's generalization ability was confirmed by generating data points beyond the original dataset. Additionally, the model allows for the simultaneous calculation of multiple sets of irregular regenerator parameters, and reduces the calculation time from 2500 min for 1000 pieces using REGEN 3.3 software to just 130 ms, representing a decrease by nearly six orders of magnitude. This approach effectively resolves the long computation time associated with traditional numerical simulation methods, and will present a new solution for the rapid and precise design of regenerators.

View all citing articles on Scopus

View full text

DSTP-RNN: A dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction

Highlights

Abstract

Introduction

Section snippets

Related work

Notation

Models

Experiments

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Electric Power Systems Research

Neurocomputing

IEEE Transactions on Image Processing

Knowledge-Based Systems

Computers and Electronics in Agriculture

Energy and Buildings

Energy & Buildings

IEEE Transactions on Pattern Analysis and Machine Intelligence

Understanding deep neural networks with rectified linear units

Neural machine translation by jointly learning to align and translate

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Empirical evaluation of gated recurrent neural networks on sequence modeling

Period-aware content attention rnns for time series forecasting with missing values

Neurocomputing

Position-based content attention for time series forecasting with sequence-to-sequence RNNs

RETAIN: Interpretable predictive model in healthcare using reverse time attention mechanism

In Advances in Neural Information Processing Systems

Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition

Systems Man & Cybernetics IEEE Transactions on, SMC-13

Time-series modelling and forecasting: modelling of rainfall prediction using ARIMA model

International Journal of Society Systems Science

Financial time series prediction using least squares support vector machines within the evidence framework

IEEE Transactions on Neural Networks, 12

Long short-term memory

Neural Computation

Laplacian echo state network for multivariate time series prediction

IEEE Transactions on Neural Networks and Learning Systems

Sequence to sequence learning with neural networks