Elsevier

Neurocomputing

Volume 388, 7 May 2020, Pages 269-279
Neurocomputing

Multivariate time series forecasting via attention-based encoder–decoder framework

https://doi.org/10.1016/j.neucom.2019.12.118Get rights and content

Abstract

Time series forecasting is an important technique to study the behavior of temporal data and forecast future values, which is widely applied in many fields, e.g. air quality forecasting, power load forecasting, medical monitoring, and intrusion detection. In this paper, we firstly propose a novel temporal attention encoder–decoder model to deal with the multivariate time series forecasting problem. It is an end-to-end deep learning structure that integrates the traditional encode context vector and temporal attention vector for jointly temporal representation learning, which is based on bi-directional long short-term memory networks (Bi-LSTM) layers with temporal attention mechanism as the encoder network to adaptively learning long-term dependency and hidden correlation features of multivariate temporal data. Extensive experimental results on five typical multivariate time series datasets showed that our model has the best forecasting performance compared with baseline methods.

Introduction

Time series forecasting has received much attention in recent decades due to its important applications in many fields [1], including traffic flow forecasting [2], air pollution forecasting [3], time series anomaly detection [4], medical monitoring analysis [5], network intrusion detection, etc. Generally speaking, time series data can be described as a set of observations in chronological order. Its type can be divided into univariate time series and multivariate time series, which have the characteristics of a large scale in data size, high dimensionality, and constant change. Many scholars have been studying the problem of time series prediction, especially based on classical statistical model such as ARIMA [6], and typical machine learning models like HMM [8], SVR [7], and ANN [9], [10], [11]. But many traditional methods mostly employ statistical model to research the evolution of temporal data. For example, Zhang proposed a hybrid method that combines both ARIMA and ANN for linear and nonlinear time series modeling [9]. Pai et al. presented a hybrid SSVR method for forecasting time series by employing SARIMA and SVR models [7]. Sapankevych and Sankar provided a survey of time series prediction applications using a classical machine learning model: Support Vector Machines (SVM), which provides us with insight into the advantages and challenges using SVM for time series forecasting [12].

With the arrival of the big data era, multivariate and multichannel massive time series data are increasing explosively. In many cases, multivariate time series data has high dimensional and spatial-temporal dependency characteristics, or contains noisy data, which makes it difficult to be modeled effectively by classical statistical methods [10]. In addition, those traditional methods face difficulties in processing big data, especially massive and complex multivariate temporal data. Therefore, data-driven time series prediction methods are increasingly favored by researchers, and some researchers have made progress in many fields, e.g. air pollution forecasting [3,13], traffic flow prediction [14], anomaly detection [4], and urban crowd flow prediction [15]. Recently, deep learning [16,17] has been used in a lot of areas and achieved the best results on many benchmark data sets, e.g. image [18] and video processing [19], natural language understanding [20]. Although traditional methods can still be used for time series modeling, deep learning-based forecasting methods are becoming more popular [21], [22], [23].

In addition, the traditional time series forecasting methods also face the following two challenges: firstly, many classical methods mostly solve the single-step prediction problem, which is limited in practical monitoring and early warning applications based on time series modeling. The single-step time series forecasting usually does not help a lot in a real early-warning application since it is difficult to forecast what is going to happen after a multi-step condition. Moreover, multi-step forecasting is more complicated than single-step forecasting, which must consider more conditions, e.g. accumulation of errors and degradation of forecasting performance [24]. Secondly, in some cases, there are correlations between variables in multivariate time series data, and a better understanding is often obtained by modeling all related variables together than just by modeling one variable. So it is significant to study the prediction model based on multivariable time series data.

In response to the above two issues, a Bi-LSTM based encoder–decoder with attention mechanism is proposed for the first time, which aims to achieve the purpose of adaptively learning the implicit temporal dependency features of multivariate time series data. And the contributions of this paper include the following aspects:

  • 1)

    Firstly, we propose a novel temporal attention-based encoder–decoder model and apply it to the multivariate time series multi-step forecasting tasks. The Bi-LSTM with attention mechanism is used to encode the hidden representations of multivariate time series data as the temporal context vector, and the other LSTM is used to decode the hidden representation for prediction. Through this end-to-end process, hidden long-term dependent features and non-linear correlation features can be learned from the raw multivariate time series data.

  • 2)

    Secondly, we introduce the temporal attention mechanism between the encoder network and decoder network, which can select relevant encoder hidden states across all time-steps for multi-step forecasting more accurately, so as to improve our model's representation ability of dynamic multivariate time series data.

  • 3)

    Finally, we demonstrate the effectiveness of our model by testing it on five actual multivariate time series datasets, and the experiment results showed that our model has the best forecasting performance compared with baseline methods.

The rest of the paper is arranged as follows: Section 2 gives an overview of related works on time series modeling and forecasting. Section 3 expounds the research motivation of the proposed model, analyzes the overall architecture of the attention encoder–decoder framework, and describes the relevant theory and process details. Section 4 is the experimental analysis content. A comparative experiment based on five multivariate time series data sets is performed, and the experimental results are analyzed in detail. Finally, we give the conclusions of the study and discuss future research priorities.

Section snippets

Related works

The key to time series modeling is to design effective feature representation methods for dynamic time series data, which is a challenging topic and faces issues such as high dimensional, dynamic and uncertainty, and it involves a wide range of aspects and has always been the focus of researchers. Traditional time series forecasting methods often use statistical models or typical machine learning methods [1], e.g. ARIMA [6], SVR [7], HMM [8], and ANN [9]. And Ahmed et al. made a comparative

Problem and definitions

Time series data are usually sequences of values (in discrete or continuous form) measured over time. The dynamic updating, uncertainty and high dimensionality of time series make it different from other data such as image, text and so on. Time series forecasting has always been a very important research area of data mining tasks, whose goal is to predict the change of future temporal values. And the observation time interval of different types of time series data is often different and decided

Experiments

In this section, we evaluate the forecasting ability of the proposed model on five public multivariate time series datasets. Through comparison of the baseline shallow learning and deep learning models, the forecasting performance and effectiveness of the proposed model are validated.

Conclusion and future work

In this paper, we proposed an end-to-end deep learning framework for multivariate time series forecasting, which leverages the idea of the encoder–decoder learning structure with Bi-LSTM and is augmented with a temporal attention mechanism. It is a firstly proposed representation method of dynamic multivariate time series data, which can jointly learn the long-term temporal dependencies pattern and non-linear correlation features of multivariate temporal data. Experiments on five multivariate

CRediT authorship contribution statement

Shengdong Du: Methodology, Investigation, Software, Writing - original draft. Tianrui Li: Supervision, Conceptualization, Resources, Writing - review & editing, Project administration. Yan Yang: Writing - review & editing. Shi-Jinn Horng: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China (Nos. 61773324 and 61976247), the “Center for Cyber-physical System Innovation” from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan and MOST under 106-2221-E-011-149-MY2 and 108-2218-E-011-006.

Shengdong Du received the B.S. and M.S. degrees in Computer Science from Chongqing University in 2004 and 2007, respectively. He is currently a Ph.D. candidate in the School of Information Science and Technology, Southwest Jiaotong University. His research interests include data mining and machine learning.

References (49)

  • Y. Tian et al.

    LSTM-based traffic flow prediction with missing data

    Neurocomputing

    (2018)
  • Y. Bao et al.

    Multi-step-ahead time series prediction using multiple-output support vector regression

    Neurocomputing

    (2014)
  • M.R. Hassan et al.

    A fusion model of HMM, ANN and GA for stock market forecasting

    Exp. Syst. Appl.

    (2007)
  • W. Liu et al.

    A survey of deep neural network architectures and their applications

    Neurocomputing

    (2017)
  • J. Du Preez et al.

    Univariate versus multivariate time series forecasting: an application to international tourism demand

    Int. J. Forecast.

    (2003)
  • Qi Z., Wang T., Song G., et al. Deep air learning: Interpolation, prediction, and feature analysis of fine-grained air...
  • G.E.P. Box et al.

    Distribution of residual autocorrelations in autoregressive-integrated moving average time series models

    J. Am. Stat. Assoc.

    (1970)
  • S.H. Park et al.

    Forecasting change directions for financial time series using hidden Markov model

  • N.I. Sapankevych et al.

    Time series prediction using support vector machines: a survey

    IEEE Comput. Intell. Mag.

    (2009)
  • B.S. Freeman et al.

    Forecasting air quality time series using deep learning

    J. Air Waste Manag. Assoc.

    (2018)
  • Y. Lv et al.

    Traffic flow prediction with big data: a deep learning approach

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • J. Zhang et al.

    Deep spatio-temporal residual networks for citywide crowd flows prediction

  • A. Krizhevsky et al.

    Imagenet classification with deep convolutional neural networks

  • A. Karpathy et al.

    Deep visual-semantic alignments for generating image descriptions

  • Cited by (268)

    View all citing articles on Scopus

    Shengdong Du received the B.S. and M.S. degrees in Computer Science from Chongqing University in 2004 and 2007, respectively. He is currently a Ph.D. candidate in the School of Information Science and Technology, Southwest Jiaotong University. His research interests include data mining and machine learning.

    Tianrui Li received the B.S., M.S., and Ph.D. degrees from the Southwest Jiaotong University, Chengdu, China in 1992, 1995, and 2002, respectively. He was a postdoctoral researcher with SCK●CEN, Belgium from 2005 to 2006, and a visiting professor with Hasselt University, Belgium, in 2008, the University of Technology, Sydney, Australia, in 2009, and the University of Regina, Canada, 2014. He is currently a Professor and the Director of the Key Laboratory of Cloud Computing and Intelligent Techniques, Southwest Jiaotong University. He has authored or coauthored more than 300 research papers in refereed journals and conferences. His research interests include big data, cloud computing, data mining, granular computing and rough sets. He is a fellow of IRSS and senior member of ACM and IEEE.

    Yan Yang received the B.S. and M.S. degrees from Huazhong University of Science and Technology, Wuhan, China, in 1984 and 1987, respectively. She received Ph.D. degree from Southwest Jiaotong University, Chengdu, China in 2007. From 2002 to 2003 and 2004 to 2005, she was a visiting scholar with the University of Waterloo, Canada. She is currently a Professor and Vice Dean with the School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China. She is an Academic and Technical Leader of Sichuan Province. Her research interests include artificial intelligence, big data analysis and mining, ensemble learning, cloud computing and service. She has participated in more than 10 high-level projects recently, authored and co-authored over 150 papers in journals and international conference proceedings. She also serves as the Vice Chair of ACM Chengdu Chapter, a distinguished member of CCF, a senior member of CAAI, and a member of IEEE and ACM.

    Shi-Jinn Horng received the B.S. degree in electronics engineering from National Taiwan Institute of Technology, the M.S. degree in information engineering from National Central University, and the Ph.D. degree from National Tsing Hua University, in 1980, 1984, and 1989, respectively. He is currently a Chair Professor in the Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology. He has published more than 200 research papers and received many awards. Especially, the Distinguished Research Award got from the National Science Council in Taiwan in 2004. His research interests include deep learning, biometric recognition, image processing, and information security.

    View full text