skip to main content
10.1145/3292500.3330896acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Modeling Extreme Events in Time Series Prediction

Published: 25 July 2019 Publication History

Abstract

Time series prediction is an intensively studied topic in data mining. In spite of the considerable improvements, recent deep learning-based methods overlook the existence of extreme events, which result in weak performance when applying them to real time series. Extreme events are rare and random, but do play a critical role in many real applications, such as the forecasting of financial crisis and natural disasters. In this paper, we explore the central theme of improving the ability of deep learning on modeling extreme events for time series prediction. Through the lens of formal analysis, we first find that the weakness of deep learning methods roots in the conventional form of quadratic loss. To address this issue, we take inspirations from the Extreme Value Theory, developing a new form of loss called Extreme Value Loss (EVL) for detecting the future occurrence of extreme events. Furthermore, we propose to employ Memory Network in order to memorize extreme events in historical records.By incorporating EVL with an adapted memory network module, we achieve an end-to-end framework for time series prediction with extreme events. Through extensive experiments on synthetic data and two real datasets of stock and climate, we empirically validate the effectiveness of our framework. Besides, we also provide a proper choice for hyper-parameters in our proposed framework by conducting several additional experiments.

References

[1]
Sergio Albeverio, Volker Jentsch, and Holger Kantz. 2006. Extreme events in nature and society .Springer Science & Business Media.
[2]
Eduardo Altmann and Kantz H. 2005. Recurrence time analysis, long-term correlations, and extreme events. Physical Review E Statistical Nonlinear and Soft Matter Physics (2005).
[3]
Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017).
[4]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[5]
Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. 2005. Clustering with Bregman divergences. Journal of machine learning research, Vol. 6, Oct (2005), 1705--1749.
[6]
Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy layer-wise training of deep networks. In Advances in neural information processing systems. 153--160.
[7]
Tine Buch-Larsen, Jens Perch Nielsen, Montserrat Guillén, and Catalina Bolancé. 2005. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics, Vol. 39, 6 (2005), 503--516.
[8]
Armin Bunde, Jan F Eichner, Shlomo Havlin, and Jan W Kantelhardt. 2003. The effect of long-term correlations on the return periods of rare events. Physica A: Statistical Mechanics and its Applications, Vol. 330, 1--2 (2003), 1--7.
[9]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[10]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[11]
Sakyasingha Dasgupta and Takayuki Osogami. 2017. Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction. In AAAI . 1833--1839.
[12]
Laurens De Haan and Ana Ferreira. 2007. Extreme value theory: an introduction .Springer Science & Business Media.
[13]
Eugen Diaconescu. 2008. The use of NARX neural networks to predict chaotic time series. Wseas Transactions on computer research, Vol. 3, 3 (2008), 182--191.
[14]
Daizong Ding, Mi Zhang, Xudong Pan, Duocai Wu, and Pearl Pu. 2018. Geographical Feature Extraction for Entities in Location-based Social Networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 833--842.
[15]
Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, Vol. 28, 4 (2006), 594--611.
[16]
Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua. 2019. Temporal Relational Ranking for Stock Prediction. ACM Transactions on Information Systems (TOIS), Vol. 37, 2 (2019), 27.
[17]
R. A. Fisher and L. H. C. Tippett. 1928. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 24, 2 (1928), 180--190.
[18]
JANOS GALAMBOS. 1977. The asymptotic theory of extreme order statistics. In The Theory and Applications of Reliability with Emphasis on Bayesian and Nonparametric Methods . Elsevier, 151--164.
[19]
M Ghil, P Yiou, Stéphane Hallegatte, BD Malamud, P Naveau, A Soloviev, P Friederichs, V Keilis-Borok, D Kondrashov, V Kossobokov, et almbox. 2011. Extreme events: dynamics, statistics and prediction. Nonlinear Processes in Geophysics, Vol. 18, 3 (2011), 295--350.
[20]
Gnedenko. 1943. Sur la distribution limite du terme maximum d'une série aléatoire. Annals of Mathematics, Vol. 44, 3 (1943), 423--453.
[21]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[22]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[23]
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks, Vol. 2, 5 (1989), 359--366.
[24]
Holger Kantz, Eduardo G Altmann, Sarah Hallerberg, Detlef Holstein, and Anja Riegert. 2006. Dynamical interpretation of extreme events: predictability and predictions. In Extreme events in nature and society . Springer, 69--93.
[25]
Charles David Keeling and Timothy P Whorf. 2004. Atmospheric CO2 concentrations derived from flask air samples at sites in the SIO network. Trends: a compendium of data on Global Change (2004).
[26]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[27]
Samuel Kotz and Saralees Nadarajah. 2000. Extreme value distributions. Theory and applications .Prentice Hall,. 207--243 pages.
[28]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[29]
Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, and Sanjeev Arora. 2017. On the ability of neural nets to express distributions. arXiv preprint arXiv:1702.07028 (2017).
[30]
Tao Lin, Tian Guo, and Karl Aberer. 2017. Hybrid neural networks for learning the trend in time series. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 . 2273--2279.
[31]
Tsungnan Lin, Bill G Horne, Peter Tino, and C Lee Giles. 1996. Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, Vol. 7, 6 (1996), 1329--1338.
[32]
Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence (2018).
[33]
Edward N Lorenz. 1963. Deterministic nonperiodic flow. Journal of the atmospheric sciences, Vol. 20, 2 (1963), 130--141.
[34]
DD Lucas, C Yver Kwok, P Cameron-Smith, H Graven, D Bergmann, TP Guilderson, R Weiss, and R Keeling. 2015. Designing optimal greenhouse gas observing networks that consider performance and cost. Geoscientific Instrumentation, Methods and Data Systems, Vol. 4, 1 (2015), 121--137.
[35]
T Okubo and N Narita. 1980. On the distribution of extreme winds expected in Japan. National Bureau of Standards Special Publication, Vol. 560 (1980), 1.
[36]
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017).
[37]
Tomasz Rolski, Hanspeter Schmidli, Volker Schmidt, and Jozef L Teugels. 2009. Stochastic processes for insurance and finance. Vol. 505. John Wiley & Sons.
[38]
Murray Rosenblatt. 1956. Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics (1956), 832--837.
[39]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems. 3856--3866.
[40]
Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 650--658.
[41]
Jeroen Van den Berg, Bertrand Candelon, and Jean-Pierre Urbain. 2008. A cautious note on the use of panel models to predict financial crises. Economics Letters, Vol. 101, 1 (2008), 80--83.
[42]
Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et almbox. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. 3630--3638.
[43]
Ladislaus von Bortkiewicz. 1921. Variationsbreite und mittlerer Fehler .Berliner Mathematische Gesellschaft.
[44]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In SIGIR.
[45]
Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory Networks. CoRR, Vol. abs/1410.3916 (2014).
[46]
Peter Whitle. 1951. Hypothesis testing in time series analysis. Vol. 4. Almqvist & Wiksells.
[47]
Rym Worms. 1998. Propriété asymptotique des excès additifs et valeurs extrêmes: le cas de la loi de Gumbel. Comptes Rendus de l'Academie des Sciences Series I Mathematics, Vol. 5, 327 (1998), 509--514.
[48]
Linjun Yan, Ahmed Elgamal, and Garrison W Cottrell. 2011. Substructure vibration NARX neural network approach for statistical damage inference. Journal of Engineering Mechanics, Vol. 139, 6 (2011), 737--747.
[49]
Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In Ijcai, Vol. 15. 3995--4001.
[50]
Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to do next: Modeling user behaviors by time-lstm. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 . 3602--3608.

Cited By

View all
  • (2025)Using a Neural Network to Model the Incidence Angle Dependency of Backscatter to Produce Seamless, Analysis-Ready Backscatter Composites over LandRemote Sensing10.3390/rs1703036117:3(361)Online publication date: 22-Jan-2025
  • (2025)Distribution-agnostic landslide hazard modelling via Graph TransformersEnvironmental Modelling & Software10.1016/j.envsoft.2024.106231183(106231)Online publication date: Jan-2025
  • (2025)TXtreme: transformer-based extreme value prediction framework for time series forecastingDiscover Applied Sciences10.1007/s42452-025-06478-47:2Online publication date: 24-Jan-2025
  • Show More Cited By

Index Terms

  1. Modeling Extreme Events in Time Series Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention model
    2. extreme event
    3. memory network

    Qualifiers

    • Research-article

    Funding Sources

    • National Program on KeyBasic Research
    • National Natural Science Foundation of China

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)373
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Using a Neural Network to Model the Incidence Angle Dependency of Backscatter to Produce Seamless, Analysis-Ready Backscatter Composites over LandRemote Sensing10.3390/rs1703036117:3(361)Online publication date: 22-Jan-2025
    • (2025)Distribution-agnostic landslide hazard modelling via Graph TransformersEnvironmental Modelling & Software10.1016/j.envsoft.2024.106231183(106231)Online publication date: Jan-2025
    • (2025)TXtreme: transformer-based extreme value prediction framework for time series forecastingDiscover Applied Sciences10.1007/s42452-025-06478-47:2Online publication date: 24-Jan-2025
    • (2024)Quartile Regression and Ensemble Models for Extreme Events of Multi-Time Step-Ahead Monthly Reservoir Inflow ForecastingWater10.3390/w1623338816:23(3388)Online publication date: 25-Nov-2024
    • (2024)Self-adaptive extreme penalized loss for imbalanced time series predictionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/568(5135-5143)Online publication date: 3-Aug-2024
    • (2024)Towards dynamic trend filtering through trend point detection with reinforcement learningProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/257(2324-2332)Online publication date: 3-Aug-2024
    • (2024)Analysing and Predicting Extreme Frequency Deviations: A Case Study in the Balearic Power GridACM SIGEnergy Energy Informatics Review10.1145/3717413.37174274:4(155-162)Online publication date: 1-Oct-2024
    • (2024)Toward Ubiquitous Interaction-Attentive and Extreme-Aware Crowd Activity Level PredictionACM Transactions on Intelligent Systems and Technology10.1145/368206315:6(1-26)Online publication date: 29-Jul-2024
    • (2024)Citation Forecasting with Multi-Context Attention-Aided Dependency ModelingACM Transactions on Knowledge Discovery from Data10.1145/364914018:6(1-23)Online publication date: 12-Apr-2024
    • (2024)MulSTE: A Multi-view Spatio-temporal Learning Framework with Heterogeneous Event Fusion for Demand-supply PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672030(1781-1792)Online publication date: 25-Aug-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media