Correlational graph attention-based Long Short-Term Memory network for multivariate time series prediction

https://doi.org/10.1016/j.asoc.2021.107377Get rights and content

Highlights

  • A nested deep network framework is proposed for multivariate time series prediction.

  • We introduce GATs to capture the temporal correlations.

  • We propose correlational attention-based LSTM to obtain the spatial correlations and transform input features into higher-level features.

  • Correlational attention-based LSTM is nested in the graph attention mechanism.

Abstract

Multi-variate time series prediction models use the historical information of multiple exogenous series to predict the future values of the target series. At present, attention-based deep networks can obtain the spatial correlations between target series and multiple exogenous series, but it is difficult to capture temporal correlations across multiple time steps, which play a role in improving prediction accuracy. To that end, we propose a correlational graph attention-based Long Short-Term Memory network (CGA-LSTM), a nested network that nests the correlational attention mechanism in the graph attention mechanism to strengthen the spatio-temporal correlations. We construct the time series as a graph structure, where nodes represent time steps in exogenous series. To obtain sufficient expressive power, we propose a nonlinear transformation, correlational attention-based LSTM, instead of the original linear transformation to transform the exogenous series into higher-level features. The original linear transformations cannot obtain spatial correlations. The correlational attention mechanism can adaptively select the relevant exogenous series to obtain the spatial correlations. Then calculating the weight coefficients between the node and its neighbors to capture the temporal correlations. The performance of the proposed algorithm was tested on 4 datasets and compared with state-of-the-art methods. The experimental results show that our model is effective, can provide higher prediction accuracy.

Introduction

Multi-variate time series data are collected from our lives [1], such as stock data from financial markets, photovoltaic power data from electric power companies, air quality and water quality data from the urban sensors, etc. Multi-variate time series data contain the values of the target series and multiple exogenous series. The prediction task of multivariate time series is very challenging, which is mainly affected by two factors: (1) One of the main obstacles is how to obtain spatial correlations, which is highly dynamic, changing over time, between target series and multiple exogenous series. (2) Another of the main obstacles is how to capture temporal correlations on different time steps. For instance, solar irradiance, humidity, and temperature influence photovoltaic power generation, as shown in Fig. 1. Photovoltaic power is greatly affected by solar irradiance. At night, solar irradiance is 0, no power is generated. As the sun rises, photovoltaic power first increases and then decreases with solar irradiance. We find that the trends of photovoltaic power are sometimes different from solar irradiance under the influence of temperature and humidity, such as the photovoltaic power from July 15 to July 17, 2016. The importance of each exogenous series to the target series is different. Most time series data are collected by sensors, which are noisy under the adverse influence of the surrounding environment. It is important to detect exogenous series useful for prediction.

The Autoregressive integrated moving average [2], Vector autoregressive [3], and so on are traditional methods for time series prediction. ARIMA model only chooses historical target series as inputs, which ignores exogenous series features. VAR model can effectively take advantage of exogenous series features. They require that the inputs are stable and autocorrelated, such as periodic data. Although the information covered by the VAR model is richer than that of ARIMA, they could not model spatio-temporal correlations, hence the information is limited.

Machine learning methods are reliable and has advantages in processing small numbers of high-dimensional datasets. Among them, Support vector regression [4], Markov model, Gaussian process regression [5], and Bayes models are widely used in the real world. To obtain better prediction performance, ensemble learning [6] combines multiple predictions in a weighted manner. Generally, ensemble learning performs better than individual predictor.

In the era of big data, traditional models and machine learning methods cannot suit for multi-variate time series data. With the rapid rise of deep networks, the field of time series prediction has been further developed. Many studies have been proven the effectiveness of deep networks in time series prediction. CNN [7], [8], [9] and RNN could capture local features and short-term dependencies in the data. Successful variants of RNN, such as LSTM [10] and GRU [11], could capture the long-term dependencies of time series. In recent years, new networks such as time convolutional network (TCN) [12] and memory network [13], had stronger memory capacity and were superior to LSTM and GRU in many tasks. Due to the characteristics of multivariate time series data, a single network could not completely obtain spatio-temporal correlations, which leads to low prediction accuracy. Many studies merge the above network with other methods according to the characteristics of the data to improve the prediction accuracy.

Deep networks can be divided into two categories according to time series prediction, RNN-based networks, and attention-based networks. RNN-based networks employ RNN to capture long-term dependencies. RNN-Gaussian DyBM [14] is a random generation model of multivariate multi-target time series proposed. RNN layers compute a high dimensional nonlinear feature, map of past input sequences to DyBM. The parameters of the RNN-Gaussian DyBM are updated online, the performance of this model is not as good as RNN in off-line learning. CoR [15] builds stacked Bi-directional RNNs, then on top of the deep RNNs is SGCRF [16]. RNN could effectively express the temporal correlations of time series data, SGCRF learned to output structured information. The multi-variable LSTM uses tensorized hidden states to explain the importance of the exogenous series, which acquires mixed attention in spatial and temporal [17].

Encoder–Decoder has proven very effective for seq2seq prediction problems [18], such as machine translation, speech recognition, and time series prediction. In the encoder, the input sequence is encoded as a fixed-length vector by RNN. In the Decoder, RNN decodes the fixed-length vector and outputs the predicted sequence. So this architecture is also called Encoder–Decoder RNN. But as the length of the input sequence increases, the performance of Encoder–Decoder​ will deteriorate rapidly [19]. To address this issue, the attention-based Encoder–Decoder​ has been proposed, which uses the attention mechanism to select parts of the relevant sequence [20]. The attention mechanism is used in encoder, decoder, or both, called attention-based RNN. In response to different problems, different attention mechanisms have been proposed, such as spatial attention, temporal attention, inter-attention, self-attention. The dual-stage attention-based recurrent neural network (DA-RNN) [21] combines the input attention mechanism and the temporal attention mechanism with the structure of the Encoder–Decoder LSTM to achieve advanced performance, which did enlightening work for multi-variate time series prediction. GeoMAN [22] with geo-sensory time series precondition, DTSP-RNN Network [23] that realize multi-step prediction, Multistage Attention Network [24] that is proposed for obtaining the different attention weight in different stages are all completed under the inspiration of DA-RNN Network and have achieved good results. CLVSA [25] is a hybrid attention network of the self-attention [26] and inter-attention mechanism [20], self-attention captured the informative latent features in the encoder and decoder, inter-attention strengthened the connection between the encoder and decoder. These models are effectively used to predict the value of the next time steps.

It is difficult to capture spatio-temporal correlations across multiple time steps for the above multivariate time series prediction methods. For instance, we have collected the values of solar irradiance, humidity, temperature, and photovoltaic power before today, we hope to predict the photovoltaic power at some time steps tomorrow. There are multiple time steps between historical information and future predicted values. We need a model to learn the correlations between past information and future predicted values, i.e. spatio-temporal correlations. Long- and short-term Time-series Network (LSTNet) [1] uses CNN and RNN to predict multivariate multi-target time series with period models across multiple time steps. The Recurrent-skip is designed for capturing very long-term temporal patterns. One major downside of it is that the Recurrent-skip layer needs to define a hyperparameter, which could not be optimized in the model. To address this issue, attention-based RNN networks have been proposed. The attention-based RNN can be used separately, rather than in Encoder–Decoder. TPA-LSTM [27] employs temporal pattern attention to attends to the row vectors of LSTM hidden states. TPA selects those row variables that are helpful for forecasting. MTNet [28] uses three encoders, namely attention-based LSTM. The model feeds long-term time series historical data and short-term historical time series data to Encoderm and Encoderin respectively, to find the features that most likely resemble the ground truth to be predicted. Then the model used another Encoderc to obtain the weighted memory vectors. They worked well in the time series with multiple target series, without exogenous series. The attention-based RNN networks have also been widely used in other fields. To infer the causal relationship between process variables, the ALSTM-GC [29] model establishes an attention mechanism between the input and LSTM for Alarm Root-Cause Diagnosis. The inter-attention mechanism is trained for multiple related document classification tasks [30]. The attention-based RNN can capture the richer hidden features of user interest, that are combined to recommend [31].

In 2005, the theory of graph neural network (GNN) was proposed [32]. GNN is originally to solve strict graph problems, such as molecular structure classification problems. In fact, data can also be converted into graph structures in many fields, such as recommendation systems, machine translation, time series. In the recommendation systems, the interaction between products and users is constructed as a graph structure, which can make accurate recommendations by learning the graph structure. Franco Scarselli et al. [33], [34] define graph neural networks that relies on the relationships between nodes to capture information in the graph. It could be applied in relational domains. Graph structure data is irregular, each node has a graph structure. Deep learning methods, such as CNN and RNN, could extract the local features of the node, could not obtain the features between nodes in the graph structure. GNN and deep learning methods are combined to process the node features and structure information in the graph structure, such as graph convolution network (GCN) [35], [36], gate graph neural network (GGNN) [37], graph LSTM [38], graph attentional layer (GAT) [39].

Our works are to capture the spatio-temporal correlation of multiple time series with exogenous series. We propose to exploit the idea of GAT to handle the time-series forecasting tasks. GAT is a novel attention mechanism that operates on graph-structured data. GAT is first proposed to use in transductive learning tasks and inductive learning tasks. GAT assigned different weights to different neighborhood nodes, without any expensive matrix calculations or prior knowledge of the graph structure.

Inspired by GAT, we propose a correlational graph attention-based Long Short-Term Memory network (CGA-LSTM). The model consists of graph attention mechanism, correlational attention mechanism, and LSTM. The main contributions of our work are summarized follows:

  • (1)

    A nested deep network framework is proposed to capture spatio-temporal correlations for multivariate time series prediction.

  • (2)

    To obtain sufficient expressive power, we nest correlational attention-based LSTM in the graph attention mechanism, transform the node features into higher-level features. The correlational attention mechanism adaptively selects the relevant exogenous series to obtain the spatial correlations. When calculating the correlational attention weight, the weight is first assigned to the exogenous series, which is determined by the target series.

  • (3)

    The time series data is constructed as a graph structure in our paper, a time step represents a node. Calculating the weight coefficients between the node and its neighbors to capture the temporal correlations. GAT assigned different weights to different time steps. In other words, the importance of historical information on future values is different.

The remaining part of the paper proceeds as follows: In Section 2, we first formulate the Multi-variate time series prediction problem, and then introduce LSTM and GAT. We present the CGA-LSTM framework in Section 3, followed by the rigorous experimental analysis in Section 4, and the conclusions and future works in Section 5.

Section snippets

Problem formulation

Multi-variate time series data consist of multiple exogenous series and target series. We represent multivariate time series data as multiple directed graphs, the numbers of nodes are T in each directed graph. Given n exogenous series, the exogenous series at time t, xt=(xt1,xt2,,xtn)Rn, as the feature of node t in graph structure, x={x1,x2,,xt1} represents the neighborhood node features of node t.

Given the previous values of the target time series,(y1,y2,,yT) with ytR, and past values of

Framework

Fig. 2 presents the framework of our model, which illustrates the computational process of CGA-LSTM. Our model consists of two parts, one is CGA, which is used to obtain spatio-temporal correlations, another one is a predictor, namely LSTM, which is used to predict future values. CGA is a nested network, composed of two attention mechanisms as follows: (1) graph attention mechanism. It calculates the weight coefficients between the node and its neighbors to capture the temporal correlations.

Datasets

To evaluate the performance of our model, we use two different types of datasets: periodic datasets and non-periodic datasets. Each dataset contains a target series and multiple exogenous series.

Periodic data has the characteristics of showing the same fluctuation pattern in each time period:

  • Photovoltaic (PV) Power Dataset: It is a competition data called National Energy Day-ahead PV power prediction. We employ PV power as the target series. The solar irradiance, wind speed, wind direction,

Conclusions

This study set out to built correlational graph attention-based LSTM network for multivariate time series prediction across multiple time steps. In this model, each time step is regarded as a node, the graph attention mechanism calculates the weight between the node and some neighborhood of node to obtain time correlation. To obtain the spatial correlations between exogenous series and target series, the correlational attention-based LSTM is proposed to capture spatial correlations. The model

CRediT authorship contribution statement

Shuang Han: Experiment, Software, Investigation, Writing - original draft. Hongbin Dong: Validation, Review, Investigation, Editing. Xuyang Teng: Validation, Writing - review & editing. Xiaohui Li: Validation, Writing - review & editing. Xiaowei Wang: Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to acknowledge the support from National Natural Science Foundation of China (Nos. 61472095, 61906055) and Natural Science Foundation of Heilongjiang Province (Nos. LH2020F023), China.

References (48)

  • de MedranoR. et al.

    A spatio-temporal attention-based spot-forecasting framework for urban traffic prediction

    Appl. Soft Comput. J.

    (2020)
  • AsadiR. et al.

    A spatio-temporal decomposition based deep neural network for time series forecasting

    Appl. Soft Comput. J.

    (2020)
  • HochreiterS. et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Gated feedback recurrent neural networks, in: 32nd International Conference...
  • BaiS. et al.

    An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

    (2018)
  • SukhbaatarS. et al.

    End-to-End Memory Networks

    (2015)
  • S. Dasgupta, T. Osogami, Nonlinear dynamic Boltzmann machines for time-series prediction, in: 31st AAAI Conf. Artif....
  • X. Wang, M. Zhang, F. Ren, Sparse Gaussian conditional random fields on top of recurrent neural networks, in: 32nd AAAI...
  • WytockM. et al.

    Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting

    (2013)
  • GuoT. et al.

    Multi-variable LSTM neural network for autoregressive exogenous model, no. 2017

    (2018)
  • BrownleeJ.

    Long short-term memory networks with Python

  • ChoK. et al.

    On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

    (2015)
  • BahdanauD. et al.

    Neural Machine Translation by Jointly Learning to Align and Translate

    (2015)
  • QinY. et al.

    A dual-stage attention-based recurrent neural network for time series prediction

  • Cited by (23)

    View all citing articles on Scopus
    View full text