research-article

Multi-Horizon Time Series Forecasting with Temporal Attention Learning

Authors:
Chenyou Fan

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Yuze Zhang

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Yi Pan

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Xiaoyue Li

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Chi Zhang

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Rong Yuan

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Di Wu

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Wensheng Wang

JD.com, Mountain View, CA, USA

JD.com, Mountain View, CA, USA
View Profile

,
Jian Pei

JD.com & Simon Fraser University, Mountain View, CA, Canada

JD.com & Simon Fraser University, Mountain View, CA, Canada
View Profile

,
Heng Huang

JD Finance America Corporation & University of Pittsburgh, Pittsburgh, PA, USA

JD Finance America Corporation & University of Pittsburgh, Pittsburgh, PA, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 2527–2535https://doi.org/10.1145/3292500.3330662

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2527–2535

ABSTRACT

We propose a novel data-driven approach for solving multi-horizon probabilistic forecasting tasks that predicts the full distribution of a time series on future horizons. We illustrate that temporal patterns hidden in historical information play an important role in accurate forecasting of long time series. Traditional methods rely on setting up temporal dependencies manually to explore related patterns in historical data, which is unrealistic in forecasting long-term series on real-world data. Instead, we propose to explicitly learn constructing hidden patterns' representations with deep neural networks and attending to different parts of the history for forecasting the future.

In this paper, we propose an end-to-end deep-learning framework for multi-horizon time series forecasting, with temporal attention mechanisms to better capture latent patterns in historical data which are useful in predicting the future. Forecasts of multiple quantiles on multiple future horizons can be generated simultaneously based on the learned latent pattern features. We also propose a multimodal fusion mechanism which is used to combine features from different parts of the history to better represent the future. Experiment results demonstrate our approach achieves state-of-the-art performance on two large-scale forecasting datasets in different domains.

References

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. {n. d.}. Tensorflow: a system for large-scale machine learning.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Machine Learning .Google Scholar
Filippo Maria Bianchi, Enrico Maiorino, Michael C Kampffmeyer, Antonello Rizzi, and Robert Jenssen. 2017. An overview and comparative analysis of recurrent neural networks for short term load forecasting. arXiv preprint arXiv:1705.04378 (2017).Google Scholar
G. E. P. Box and G. M. Jenkins. 1968. Some Recent Advances in Forecasting and Control. Journal of the Royal Statistical Society (1968).Google Scholar
Yagmur Gizem Cinar, Hamid Mirisaee, Parantapa Goswami, Eric Gaussier, Ali A"it-Bachir, and Vadim Strijov. 2017. Position-based content attention for time series forecasting with sequence-to-sequence rnns. In Advances in Neural Information Processing Systems .Google Scholar
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE Conference on Computer Vision and Pattern Recognition .Google ScholarCross Ref
Jeffrey L Elman. 1990. Finding structure in time. Cognitive science (1990).Google Scholar
Valentin Flunkert, David Salinas, and Jan Gasthaus. 2017. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. arXiv preprint arXiv:1704.04110 .Google Scholar
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001).Google Scholar
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition . Google ScholarDigital Library
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).Google Scholar
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE Conference on Acoustics, Speech and Signal Processing .Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997).Google Scholar
Charles C Holt. 2004. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting (2004).Google ScholarCross Ref
Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli, and Rob J Hyndman. 2016. Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond. International Journal of Forecasting (2016).Google Scholar
Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R Hershey, Tim K Marks, and Kazuhiko Sumi. 2017. Attention-based multimodal fusion for video description. In IEEE International Conference on Computer Vision .Google ScholarCross Ref
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In IEEE Conference on Computer Vision and Pattern Recognition .Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations .Google Scholar
Roger Koenker and Gilbert Bassett Jr. 1978. Regression quantiles. Econometrica: journal of the Econometric Society (1978).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . Google ScholarDigital Library
Martin L"angkvist, Lars Karlsson, and Amy Loutfi. 2014. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters (2014).Google Scholar
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research (2008).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems . Google ScholarDigital Library
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).Google Scholar
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems . Google ScholarDigital Library
Souhaib Ben Taieb and Amir F Atiya. 2016. A bias and variance analysis for multistep-ahead time series forecasting. IEEE Transactions on Neural Networks and Learning Systems (2016).Google ScholarCross Ref
Ruofeng Wen, Kari Torkkola, Balakrishnan Narayanaswamy, and Dhruv Madeka. 2017. A Multi-Horizon Quantile Recurrent Forecaster. arXiv preprint arXiv:1711.11053 (2017).Google Scholar
Peter R Winters. 1960. Forecasting sales by exponentially weighted moving averages. Management science (1960). Google ScholarDigital Library
Qifa Xu, Xi Liu, Cuixia Jiang, and Keming Yu. 2016. Quantile autoregression neural network model with applications to evaluating value at risk. Applied Soft Computing (2016). Google ScholarDigital Library
Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in Neural Information Processing Systems . Google ScholarDigital Library
Songfeng Zheng. 2010. Boosting based conditional quantile estimation for regression and binary classification. In Mexican International Conference on Artificial Intelligence . Google ScholarDigital Library

Index Terms

Multi-Horizon Time Series Forecasting with Temporal Attention Learning
1. Applied computing
2. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches
      1. Neural networks

Recommendations

Sales forecasting for Chemical Products by Using SARIMA Model
ICBDE '22: Proceedings of the 5th International Conference on Big Data and Education

Sales forecasting is widely used in enterprise resource management, which provides valuable information for efficient management. Sales forecasting facilitates the company to produce and stock products on demand. Based on the analysis of time series, ...
Read More
Time series forecasting by recurrent product unit neural networks

Time series forecasting (TSF) consists on estimating models to predict future values based on previously observed values of time series, and it can be applied to solve many real-world problems. TSF has been traditionally tackled by considering ...
Read More
Time series forecasting with the WARIMAX-GARCH method

It is well-known that causal forecasting methods that include appropriately chosen Exogenous Variables (EVs) very often present improved forecasting performances over univariate methods. However, in practice, EVs are usually difficult to obtain and in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
business modeling
machine learning
recurrent neural networks
sales forecasting
supply chain
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 107
  Total Citations
  View Citations
- 4,317
  Total Downloads
- Downloads (Last 12 months)496
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-Horizon Time Series Forecasting with Temporal Attention Learning

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sales forecasting for Chemical Products by Using SARIMA Model

Time series forecasting by recurrent product unit neural networks

Time series forecasting with the WARIMAX-GARCH method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-Horizon Time Series Forecasting with Temporal Attention Learning

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sales forecasting for Chemical Products by Using SARIMA Model

Time series forecasting by recurrent product unit neural networks

Time series forecasting with the WARIMAX-GARCH method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media