research-article

Modeling Extreme Events in Time Series Prediction

Authors:

Xiangnan HeAuthors Info & Claims

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1114 - 1122

https://doi.org/10.1145/3292500.3330896

Published: 25 July 2019 Publication History

Abstract

Time series prediction is an intensively studied topic in data mining. In spite of the considerable improvements, recent deep learning-based methods overlook the existence of extreme events, which result in weak performance when applying them to real time series. Extreme events are rare and random, but do play a critical role in many real applications, such as the forecasting of financial crisis and natural disasters. In this paper, we explore the central theme of improving the ability of deep learning on modeling extreme events for time series prediction. Through the lens of formal analysis, we first find that the weakness of deep learning methods roots in the conventional form of quadratic loss. To address this issue, we take inspirations from the Extreme Value Theory, developing a new form of loss called Extreme Value Loss (EVL) for detecting the future occurrence of extreme events. Furthermore, we propose to employ Memory Network in order to memorize extreme events in historical records.By incorporating EVL with an adapted memory network module, we achieve an end-to-end framework for time series prediction with extreme events. Through extensive experiments on synthetic data and two real datasets of stock and climate, we empirically validate the effectiveness of our framework. Besides, we also provide a proper choice for hyper-parameters in our proposed framework by conducting several additional experiments.

References

[1]

Sergio Albeverio, Volker Jentsch, and Holger Kantz. 2006. Extreme events in nature and society .Springer Science & Business Media.

[2]

Eduardo Altmann and Kantz H. 2005. Recurrence time analysis, long-term correlations, and extreme events. Physical Review E Statistical Nonlinear and Soft Matter Physics (2005).

[3]

Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017).

[4]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[5]

Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. 2005. Clustering with Bregman divergences. Journal of machine learning research, Vol. 6, Oct (2005), 1705--1749.

Digital Library

[6]

Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy layer-wise training of deep networks. In Advances in neural information processing systems. 153--160.

Digital Library

[7]

Tine Buch-Larsen, Jens Perch Nielsen, Montserrat Guillén, and Catalina Bolancé. 2005. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics, Vol. 39, 6 (2005), 503--516.

[8]

Armin Bunde, Jan F Eichner, Shlomo Havlin, and Jan W Kantelhardt. 2003. The effect of long-term correlations on the return periods of rare events. Physica A: Statistical Mechanics and its Applications, Vol. 330, 1--2 (2003), 1--7.

[9]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[10]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[11]

Sakyasingha Dasgupta and Takayuki Osogami. 2017. Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction. In AAAI . 1833--1839.

Digital Library

[12]

Laurens De Haan and Ana Ferreira. 2007. Extreme value theory: an introduction .Springer Science & Business Media.

[13]

Eugen Diaconescu. 2008. The use of NARX neural networks to predict chaotic time series. Wseas Transactions on computer research, Vol. 3, 3 (2008), 182--191.

Digital Library

[14]

Daizong Ding, Mi Zhang, Xudong Pan, Duocai Wu, and Pearl Pu. 2018. Geographical Feature Extraction for Entities in Location-based Social Networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 833--842.

Digital Library

[15]

Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, Vol. 28, 4 (2006), 594--611.

Digital Library

[16]

Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua. 2019. Temporal Relational Ranking for Stock Prediction. ACM Transactions on Information Systems (TOIS), Vol. 37, 2 (2019), 27.

Digital Library

[17]

R. A. Fisher and L. H. C. Tippett. 1928. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 24, 2 (1928), 180--190.

[18]

JANOS GALAMBOS. 1977. The asymptotic theory of extreme order statistics. In The Theory and Applications of Reliability with Emphasis on Bayesian and Nonparametric Methods . Elsevier, 151--164.

[19]

M Ghil, P Yiou, Stéphane Hallegatte, BD Malamud, P Naveau, A Soloviev, P Friederichs, V Keilis-Borok, D Kondrashov, V Kossobokov, et almbox. 2011. Extreme events: dynamics, statistics and prediction. Nonlinear Processes in Geophysics, Vol. 18, 3 (2011), 295--350.

[20]

Gnedenko. 1943. Sur la distribution limite du terme maximum d'une série aléatoire. Annals of Mathematics, Vol. 44, 3 (1943), 423--453.

[21]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

Digital Library

[22]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[23]

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks, Vol. 2, 5 (1989), 359--366.

Digital Library

[24]

Holger Kantz, Eduardo G Altmann, Sarah Hallerberg, Detlef Holstein, and Anja Riegert. 2006. Dynamical interpretation of extreme events: predictability and predictions. In Extreme events in nature and society . Springer, 69--93.

[25]

Charles David Keeling and Timothy P Whorf. 2004. Atmospheric CO2 concentrations derived from flask air samples at sites in the SIO network. Trends: a compendium of data on Global Change (2004).

[26]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[27]

Samuel Kotz and Saralees Nadarajah. 2000. Extreme value distributions. Theory and applications .Prentice Hall,. 207--243 pages.

[28]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[29]

Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, and Sanjeev Arora. 2017. On the ability of neural nets to express distributions. arXiv preprint arXiv:1702.07028 (2017).

[30]

Tao Lin, Tian Guo, and Karl Aberer. 2017. Hybrid neural networks for learning the trend in time series. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 . 2273--2279.

Digital Library

[31]

Tsungnan Lin, Bill G Horne, Peter Tino, and C Lee Giles. 1996. Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, Vol. 7, 6 (1996), 1329--1338.

Digital Library

[32]

Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence (2018).

[33]

Edward N Lorenz. 1963. Deterministic nonperiodic flow. Journal of the atmospheric sciences, Vol. 20, 2 (1963), 130--141.

[34]

DD Lucas, C Yver Kwok, P Cameron-Smith, H Graven, D Bergmann, TP Guilderson, R Weiss, and R Keeling. 2015. Designing optimal greenhouse gas observing networks that consider performance and cost. Geoscientific Instrumentation, Methods and Data Systems, Vol. 4, 1 (2015), 121--137.

[35]

T Okubo and N Narita. 1980. On the distribution of extreme winds expected in Japan. National Bureau of Standards Special Publication, Vol. 560 (1980), 1.

[36]

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017).

Digital Library

[37]

Tomasz Rolski, Hanspeter Schmidli, Volker Schmidt, and Jozef L Teugels. 2009. Stochastic processes for insurance and finance. Vol. 505. John Wiley & Sons.

[38]

Murray Rosenblatt. 1956. Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics (1956), 832--837.

[39]

Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems. 3856--3866.

Digital Library

[40]

Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 650--658.

Digital Library

[41]

Jeroen Van den Berg, Bertrand Candelon, and Jean-Pierre Urbain. 2008. A cautious note on the use of panel models to predict financial crises. Economics Letters, Vol. 101, 1 (2008), 80--83.

[42]

Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et almbox. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. 3630--3638.

Digital Library

[43]

Ladislaus von Bortkiewicz. 1921. Variationsbreite und mittlerer Fehler .Berliner Mathematische Gesellschaft.

[44]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In SIGIR.

Digital Library

[45]

Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory Networks. CoRR, Vol. abs/1410.3916 (2014).

[46]

Peter Whitle. 1951. Hypothesis testing in time series analysis. Vol. 4. Almqvist & Wiksells.

[47]

Rym Worms. 1998. Propriété asymptotique des excès additifs et valeurs extrêmes: le cas de la loi de Gumbel. Comptes Rendus de l'Academie des Sciences Series I Mathematics, Vol. 5, 327 (1998), 509--514.

[48]

Linjun Yan, Ahmed Elgamal, and Garrison W Cottrell. 2011. Substructure vibration NARX neural network approach for statistical damage inference. Journal of Engineering Mechanics, Vol. 139, 6 (2011), 737--747.

[49]

Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In Ijcai, Vol. 15. 3995--4001.

Digital Library

[50]

Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to do next: Modeling user behaviors by time-lstm. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 . 3602--3608.

Digital Library

Cited By

Navacchi CReuß FWagner W(2025)Using a Neural Network to Model the Incidence Angle Dependency of Backscatter to Produce Seamless, Analysis-Ready Backscatter Composites over LandRemote Sensing10.3390/rs1703036117:3(361)Online publication date: 22-Jan-2025
https://doi.org/10.3390/rs17030361
Belvederesi GTanyas HLipani ADahal ALombardo L(2025)Distribution-agnostic landslide hazard modelling via Graph TransformersEnvironmental Modelling & Software10.1016/j.envsoft.2024.106231183(106231)Online publication date: Jan-2025
https://doi.org/10.1016/j.envsoft.2024.106231
Yadav HThakkar A(2025)TXtreme: transformer-based extreme value prediction framework for time series forecastingDiscover Applied Sciences10.1007/s42452-025-06478-47:2Online publication date: 24-Jan-2025
https://doi.org/10.1007/s42452-025-06478-4
Show More Cited By

Index Terms

Modeling Extreme Events in Time Series Prediction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Meta-Transfer-Learning for Time Series Data with Extreme Events: An Application to Water Temperature Prediction
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

This paper proposes a meta-transfer-learning method for predicting daily maximum water temperature in stream networks with explicit modeling of extreme events. Accurate prediction of these extreme events is challenging because of their sparsity in the ...
Fast Memory-efficient Extreme Events Prediction in Complex Time series
ICRSA '20: Proceedings of the 2020 3rd International Conference on Robot Systems and Applications

This paper proposes a generic memory-efficient framework for realtime stochastic extreme events prediction in complex time series systems such as intrusion detection, Internet of Things (IoT), social networks, stock markets etc. Ideally we exploit the ...
Deep Extreme Mixture Model for Time Series Forecasting
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Time Series Forecasting (TSF) has been a topic of extensive research, which has many real world applications such as weather prediction, stock market value prediction, traffic control etc. Many machine learning models have been developed to address TSF, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2019

3305 pages

ISBN:9781450362016

DOI:10.1145/3292500

General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Program on KeyBasic Research
National Natural Science Foundation of China

Conference

KDD '19

Sponsor:

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 4 - 8, 2019

AK, Anchorage, USA

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

94
Total Citations
View Citations
3,146
Total Downloads

Downloads (Last 12 months)373
Downloads (Last 6 weeks)35

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Navacchi CReuß FWagner W(2025)Using a Neural Network to Model the Incidence Angle Dependency of Backscatter to Produce Seamless, Analysis-Ready Backscatter Composites over LandRemote Sensing10.3390/rs1703036117:3(361)Online publication date: 22-Jan-2025
https://doi.org/10.3390/rs17030361
Belvederesi GTanyas HLipani ADahal ALombardo L(2025)Distribution-agnostic landslide hazard modelling via Graph TransformersEnvironmental Modelling & Software10.1016/j.envsoft.2024.106231183(106231)Online publication date: Jan-2025
https://doi.org/10.1016/j.envsoft.2024.106231
Yadav HThakkar A(2025)TXtreme: transformer-based extreme value prediction framework for time series forecastingDiscover Applied Sciences10.1007/s42452-025-06478-47:2Online publication date: 24-Jan-2025
https://doi.org/10.1007/s42452-025-06478-4
Weekaew JDitthakit PKittiphattanabawon NPham Q(2024)Quartile Regression and Ensemble Models for Extreme Events of Multi-Time Step-Ahead Monthly Reservoir Inflow ForecastingWater10.3390/w1623338816:23(3388)Online publication date: 25-Nov-2024
https://doi.org/10.3390/w16233388
Wang YHan YGuo YLarson K(2024)Self-adaptive extreme penalized loss for imbalanced time series predictionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/568(5135-5143)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/568
Seong JOh SChoi JLarson K(2024)Towards dynamic trend filtering through trend point detection with reinforcement learningProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/257(2324-2332)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/257
Sharfeddine ZPütz STamhane VHagenmeyer VSchäfer B(2024)Analysing and Predicting Extreme Frequency Deviations: A Case Study in the Balearic Power GridACM SIGEnergy Energy Informatics Review10.1145/3717413.37174274:4(155-162)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1145/3717413.3717427
Huang HYang XHe STabatabaie M(2024)Toward Ubiquitous Interaction-Attentive and Extreme-Aware Crowd Activity Level PredictionACM Transactions on Intelligent Systems and Technology10.1145/368206315:6(1-26)Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682063
Ji TSelf NFu KChen ZRamakrishnan NLu C(2024)Citation Forecasting with Multi-Context Attention-Aided Dependency ModelingACM Transactions on Knowledge Discovery from Data10.1145/364914018:6(1-23)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3649140
Lin LLu ZWang SLiu YHong ZWang HWang SBaeza-Yates RBonchi F(2024)MulSTE: A Multi-view Spatio-temporal Learning Framework with Heterogeneous Event Fusion for Demand-supply PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672030(1781-1792)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672030
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten