Correlational graph attention-based Long Short-Term Memory network for multivariate time series prediction
Introduction
Multi-variate time series data are collected from our lives [1], such as stock data from financial markets, photovoltaic power data from electric power companies, air quality and water quality data from the urban sensors, etc. Multi-variate time series data contain the values of the target series and multiple exogenous series. The prediction task of multivariate time series is very challenging, which is mainly affected by two factors: (1) One of the main obstacles is how to obtain spatial correlations, which is highly dynamic, changing over time, between target series and multiple exogenous series. (2) Another of the main obstacles is how to capture temporal correlations on different time steps. For instance, solar irradiance, humidity, and temperature influence photovoltaic power generation, as shown in Fig. 1. Photovoltaic power is greatly affected by solar irradiance. At night, solar irradiance is 0, no power is generated. As the sun rises, photovoltaic power first increases and then decreases with solar irradiance. We find that the trends of photovoltaic power are sometimes different from solar irradiance under the influence of temperature and humidity, such as the photovoltaic power from July 15 to July 17, 2016. The importance of each exogenous series to the target series is different. Most time series data are collected by sensors, which are noisy under the adverse influence of the surrounding environment. It is important to detect exogenous series useful for prediction.
The Autoregressive integrated moving average [2], Vector autoregressive [3], and so on are traditional methods for time series prediction. ARIMA model only chooses historical target series as inputs, which ignores exogenous series features. VAR model can effectively take advantage of exogenous series features. They require that the inputs are stable and autocorrelated, such as periodic data. Although the information covered by the VAR model is richer than that of ARIMA, they could not model spatio-temporal correlations, hence the information is limited.
Machine learning methods are reliable and has advantages in processing small numbers of high-dimensional datasets. Among them, Support vector regression [4], Markov model, Gaussian process regression [5], and Bayes models are widely used in the real world. To obtain better prediction performance, ensemble learning [6] combines multiple predictions in a weighted manner. Generally, ensemble learning performs better than individual predictor.
In the era of big data, traditional models and machine learning methods cannot suit for multi-variate time series data. With the rapid rise of deep networks, the field of time series prediction has been further developed. Many studies have been proven the effectiveness of deep networks in time series prediction. CNN [7], [8], [9] and RNN could capture local features and short-term dependencies in the data. Successful variants of RNN, such as LSTM [10] and GRU [11], could capture the long-term dependencies of time series. In recent years, new networks such as time convolutional network (TCN) [12] and memory network [13], had stronger memory capacity and were superior to LSTM and GRU in many tasks. Due to the characteristics of multivariate time series data, a single network could not completely obtain spatio-temporal correlations, which leads to low prediction accuracy. Many studies merge the above network with other methods according to the characteristics of the data to improve the prediction accuracy.
Deep networks can be divided into two categories according to time series prediction, RNN-based networks, and attention-based networks. RNN-based networks employ RNN to capture long-term dependencies. RNN-Gaussian DyBM [14] is a random generation model of multivariate multi-target time series proposed. RNN layers compute a high dimensional nonlinear feature, map of past input sequences to DyBM. The parameters of the RNN-Gaussian DyBM are updated online, the performance of this model is not as good as RNN in off-line learning. CoR [15] builds stacked Bi-directional RNNs, then on top of the deep RNNs is SGCRF [16]. RNN could effectively express the temporal correlations of time series data, SGCRF learned to output structured information. The multi-variable LSTM uses tensorized hidden states to explain the importance of the exogenous series, which acquires mixed attention in spatial and temporal [17].
Encoder–Decoder has proven very effective for seq2seq prediction problems [18], such as machine translation, speech recognition, and time series prediction. In the encoder, the input sequence is encoded as a fixed-length vector by RNN. In the Decoder, RNN decodes the fixed-length vector and outputs the predicted sequence. So this architecture is also called Encoder–Decoder RNN. But as the length of the input sequence increases, the performance of Encoder–Decoder will deteriorate rapidly [19]. To address this issue, the attention-based Encoder–Decoder has been proposed, which uses the attention mechanism to select parts of the relevant sequence [20]. The attention mechanism is used in encoder, decoder, or both, called attention-based RNN. In response to different problems, different attention mechanisms have been proposed, such as spatial attention, temporal attention, inter-attention, self-attention. The dual-stage attention-based recurrent neural network (DA-RNN) [21] combines the input attention mechanism and the temporal attention mechanism with the structure of the Encoder–Decoder LSTM to achieve advanced performance, which did enlightening work for multi-variate time series prediction. GeoMAN [22] with geo-sensory time series precondition, DTSP-RNN Network [23] that realize multi-step prediction, Multistage Attention Network [24] that is proposed for obtaining the different attention weight in different stages are all completed under the inspiration of DA-RNN Network and have achieved good results. CLVSA [25] is a hybrid attention network of the self-attention [26] and inter-attention mechanism [20], self-attention captured the informative latent features in the encoder and decoder, inter-attention strengthened the connection between the encoder and decoder. These models are effectively used to predict the value of the next time steps.
It is difficult to capture spatio-temporal correlations across multiple time steps for the above multivariate time series prediction methods. For instance, we have collected the values of solar irradiance, humidity, temperature, and photovoltaic power before today, we hope to predict the photovoltaic power at some time steps tomorrow. There are multiple time steps between historical information and future predicted values. We need a model to learn the correlations between past information and future predicted values, i.e. spatio-temporal correlations. Long- and short-term Time-series Network (LSTNet) [1] uses CNN and RNN to predict multivariate multi-target time series with period models across multiple time steps. The Recurrent-skip is designed for capturing very long-term temporal patterns. One major downside of it is that the Recurrent-skip layer needs to define a hyperparameter, which could not be optimized in the model. To address this issue, attention-based RNN networks have been proposed. The attention-based RNN can be used separately, rather than in Encoder–Decoder. TPA-LSTM [27] employs temporal pattern attention to attends to the row vectors of LSTM hidden states. TPA selects those row variables that are helpful for forecasting. MTNet [28] uses three encoders, namely attention-based LSTM. The model feeds long-term time series historical data and short-term historical time series data to Encoder and Encoder respectively, to find the features that most likely resemble the ground truth to be predicted. Then the model used another Encoder to obtain the weighted memory vectors. They worked well in the time series with multiple target series, without exogenous series. The attention-based RNN networks have also been widely used in other fields. To infer the causal relationship between process variables, the ALSTM-GC [29] model establishes an attention mechanism between the input and LSTM for Alarm Root-Cause Diagnosis. The inter-attention mechanism is trained for multiple related document classification tasks [30]. The attention-based RNN can capture the richer hidden features of user interest, that are combined to recommend [31].
In 2005, the theory of graph neural network (GNN) was proposed [32]. GNN is originally to solve strict graph problems, such as molecular structure classification problems. In fact, data can also be converted into graph structures in many fields, such as recommendation systems, machine translation, time series. In the recommendation systems, the interaction between products and users is constructed as a graph structure, which can make accurate recommendations by learning the graph structure. Franco Scarselli et al. [33], [34] define graph neural networks that relies on the relationships between nodes to capture information in the graph. It could be applied in relational domains. Graph structure data is irregular, each node has a graph structure. Deep learning methods, such as CNN and RNN, could extract the local features of the node, could not obtain the features between nodes in the graph structure. GNN and deep learning methods are combined to process the node features and structure information in the graph structure, such as graph convolution network (GCN) [35], [36], gate graph neural network (GGNN) [37], graph LSTM [38], graph attentional layer (GAT) [39].
Our works are to capture the spatio-temporal correlation of multiple time series with exogenous series. We propose to exploit the idea of GAT to handle the time-series forecasting tasks. GAT is a novel attention mechanism that operates on graph-structured data. GAT is first proposed to use in transductive learning tasks and inductive learning tasks. GAT assigned different weights to different neighborhood nodes, without any expensive matrix calculations or prior knowledge of the graph structure.
Inspired by GAT, we propose a correlational graph attention-based Long Short-Term Memory network (CGA-LSTM). The model consists of graph attention mechanism, correlational attention mechanism, and LSTM. The main contributions of our work are summarized follows:
- (1)
A nested deep network framework is proposed to capture spatio-temporal correlations for multivariate time series prediction.
- (2)
To obtain sufficient expressive power, we nest correlational attention-based LSTM in the graph attention mechanism, transform the node features into higher-level features. The correlational attention mechanism adaptively selects the relevant exogenous series to obtain the spatial correlations. When calculating the correlational attention weight, the weight is first assigned to the exogenous series, which is determined by the target series.
- (3)
The time series data is constructed as a graph structure in our paper, a time step represents a node. Calculating the weight coefficients between the node and its neighbors to capture the temporal correlations. GAT assigned different weights to different time steps. In other words, the importance of historical information on future values is different.
The remaining part of the paper proceeds as follows: In Section 2, we first formulate the Multi-variate time series prediction problem, and then introduce LSTM and GAT. We present the CGA-LSTM framework in Section 3, followed by the rigorous experimental analysis in Section 4, and the conclusions and future works in Section 5.
Section snippets
Problem formulation
Multi-variate time series data consist of multiple exogenous series and target series. We represent multivariate time series data as multiple directed graphs, the numbers of nodes are T in each directed graph. Given n exogenous series, the exogenous series at time t, , as the feature of node t in graph structure, represents the neighborhood node features of node t.
Given the previous values of the target time series, with , and past values of
Framework
Fig. 2 presents the framework of our model, which illustrates the computational process of CGA-LSTM. Our model consists of two parts, one is CGA, which is used to obtain spatio-temporal correlations, another one is a predictor, namely LSTM, which is used to predict future values. CGA is a nested network, composed of two attention mechanisms as follows: (1) graph attention mechanism. It calculates the weight coefficients between the node and its neighbors to capture the temporal correlations.
Datasets
To evaluate the performance of our model, we use two different types of datasets: periodic datasets and non-periodic datasets. Each dataset contains a target series and multiple exogenous series.
Periodic data has the characteristics of showing the same fluctuation pattern in each time period:
- •
Photovoltaic (PV) Power Dataset: It is a competition data called National Energy Day-ahead PV power prediction. We employ PV power as the target series. The solar irradiance, wind speed, wind direction,
Conclusions
This study set out to built correlational graph attention-based LSTM network for multivariate time series prediction across multiple time steps. In this model, each time step is regarded as a node, the graph attention mechanism calculates the weight between the node and some neighborhood of node to obtain time correlation. To obtain the spatial correlations between exogenous series and target series, the correlational attention-based LSTM is proposed to capture spatial correlations. The model
CRediT authorship contribution statement
Shuang Han: Experiment, Software, Investigation, Writing - original draft. Hongbin Dong: Validation, Review, Investigation, Editing. Xuyang Teng: Validation, Writing - review & editing. Xiaohui Li: Validation, Writing - review & editing. Xiaowei Wang: Validation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to acknowledge the support from National Natural Science Foundation of China (Nos. 61472095, 61906055) and Natural Science Foundation of Heilongjiang Province (Nos. LH2020F023), China.
References (48)
- et al.
An integrated approach to bearing prognostics based on EEMD-multi feature extraction, Gaussian mixture models and Jensen–Rényi divergence
Appl. Soft Comput. J.
(2018) - et al.
An ensemble learning-based prognostic approach with degradation-dependent weights for remaining useful life prediction
Reliab. Eng. Syst. Saf.
(2019) - et al.
A survey of deep neural network architectures and their applications
Neurocomputing
(2017) - et al.
Multistage attention network for multivariate time series prediction
Neurocomputing
(2020) - et al.
Short-term traffic speed forecasting based on graph attention temporal convolutional networks
Neurocomputing
(2020) - et al.
STMAG: A spatial–temporal mixed attention graph-based convolution model for multi-data flow safety prediction
Inf. Sci.
(2020) - et al.
Modeling long- and short-term temporal patterns with deep neural networks
- et al.
Time series analysis: Forecasting and control
J. Mark. Res.
(1977) New Introduction to Multiple Time Series Analysis
(2005)- et al.
A novel dynamic-weighted probabilistic support vector regression-based ensemble for prognostics of time series data
IEEE Trans. Reliab.
(2015)
A spatio-temporal attention-based spot-forecasting framework for urban traffic prediction
Appl. Soft Comput. J.
A spatio-temporal decomposition based deep neural network for time series forecasting
Appl. Soft Comput. J.
Long short-term memory
Neural Comput.
An empirical evaluation of generic convolutional and recurrent networks for sequence modeling
End-to-End Memory Networks
Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting
Multi-variable LSTM neural network for autoregressive exogenous model, no. 2017
Long short-term memory networks with Python
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
Neural Machine Translation by Jointly Learning to Align and Translate
A dual-stage attention-based recurrent neural network for time series prediction
Cited by (23)
Could the Russia-Ukraine war stir up the persistent memory of interconnectivity among Islamic equity markets, energy commodities, and environmental factors?
2024, Research in International Business and FinanceA deep implicit memory Gaussian network for time series forecasting
2023, Applied Soft ComputingA representation learning framework for stock movement prediction[Formula presented]
2023, Applied Soft Computing