skip to main content
10.1145/3534678.3539078acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Fast Mining and Forecasting of Co-evolving Epidemiological Data Streams

Published: 14 August 2022 Publication History

Abstract

Given a large, semi-infinite collection of co-evolving epidemiological data containing the daily counts of cases/deaths/recovered in multiple locations, how can we incrementally monitor current dynamical patterns and forecast future behavior? The world faces the rapid spread of infectious diseases such as SARS-CoV-2 (COVID-19), where a crucial goal is to predict potential future outbreaks and pandemics, as quickly as possible, using available data collected throughout the world. In this paper, we propose a new streaming algorithm, EPICAST, which is able to model, understand and forecast dynamical patterns in large co-evolving epidemiological data streams. Our proposed method is designed as a dynamic and flexible system, and is based on a unified non-linear differential equation. Our method has the following properties: (a) Effective: it operates on large co-evolving epidemiological data streams, and captures important world-wide trends, as well as location-specific patterns. It also performs real-time and long-term forecasting; (b) Adaptive: it incrementally monitors current dynamical patterns, and also identifies any abrupt changes in streams; (c) Scalable: our algorithm does not depend on data size, and thus is applicable to very large data streams. In extensive experiments on real datasets, we demonstrate that EPICAST outperforms the best existing state-of-the-art methods as regards accuracy and execution speed.

Supplemental Material

MP4 File
In this video, we proposed EpiCast, which is designed for modeling and forecasting co-evolving epidemiological data streams. Using a real public COVID-19 dataset, we demonstrated that our proposed method outperforms existing methods in terms of forecasting accuracy with a significant reduction in computational time.

References

[1]
https://www.who.int/emergencies/diseases/novel-coronavirus-2019.
[2]
https://protect-public.hhs.gov/.
[3]
https://www.rki.de/EN/Home/homepage_node.html.
[4]
EpiCast. https://sites.google.com/view/epicast-demo/home.
[5]
Genomewide association study of severe covid-19 with respiratory failure. New England Journal of Medicine, 383(16):1522--1534, 2020.
[6]
B. Adhikari, B. L. Lewis, A. Vullikanti, J. M. Jiménez, and B. A. Prakash. Fast and near-optimal monitoring for healthcare acquired infection outbreaks. PLoS Comput. Biol., 15(9), 2019.
[7]
B. Adhikari, X. Xu, N. Ramakrishnan, and B. A. Prakash. Epideep: Exploiting embeddings for epidemic forecasting. In KDD, pages 577--586, 2019.
[8]
Andreadis, Georgios and Quirós Gámez, Ana Isabel. Prospective analysis of the impact of a pandemic in industry 4.0. MATEC Web Conf., 318:01037, 2020.
[9]
E. Beyazit, J. Alagurajah, and X. Wu. Online learning from data streams with varying feature spaces. In AAAI/IAAI, pages 3232--3239, 2019.
[10]
P. Chen, S. Liu, C. Shi, B. Hooi, B.Wang, and X. Cheng. Neucast: Seasonal neural forecast of power grid time series. In IJCAI, pages 3315--3321, 2018.
[11]
E. Dong, H. Du, and L. Gardner. An interactive web-based dashboard to track covid-19 in real time. The Lancet Infectious Diseases, 20(5), May 2020.
[12]
J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods. Oxford University Press, 2 edition, 2012.
[13]
C. Faloutsos, J. Gasthaus, T. Januschowski, and Y. Wang. Classical and contemporary approaches to big time series forecasting. In SIGMOD, pages 2042--2047, 2019.
[14]
V. Flunkert, D. Salinas, and J. Gasthaus. Deepar: Probabilistic forecasting with autoregressive recurrent networks. CoRR, abs/1704.04110, 2017.
[15]
W.-j. Guan, Z.-y. Ni, Y. Hu, W.-h. Liang, C.-q. Ou, J.-x. He, L. Liu, H. Shan, C.-l. Lei, D. S. Hui, B. Du, L.-j. Li, G. Zeng, K.-Y. Yuen, R.-c. Chen, C.-l. Tang, T.Wang, P.-y. Chen, J. Xiang, S.-y. Li, J.-l.Wang, Z.-j. Liang, Y.-x. Peng, L.Wei, Y. Liu, Y.-h. Hu, P. Peng, J.-m.Wang, J.-y. Liu, Z. Chen, G. Li, Z.-j. Zheng, S.-q. Qiu, J. Luo, C.- j. Ye, S.-y. Zhu, and N.-s. Zhong. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine, 382(18):1708--1720, 2020.
[16]
E. C. Holmes, G. Dudas, A. Rambaut, and K. G. Andersen. The evolution of ebola virus: Insights from the 2013--2016 epidemic. Nature, 538(7624):193--200, Oct 2016.
[17]
M. R. Islam, S. Muthiah, B. Adhikari, B. A. Prakash, and N. Ramakrishnan. Deepdiffuse: Predicting the 'who' and 'when' in cascades. In ICDM, pages 1055--1060, 2018.
[18]
E. A. Jackson. Perspectives of Nonlinear Dynamics, volume 1. Cambridge University Press, 1989.
[19]
B. Korber, W. M. Fischer, S. Gnanakaran, H. Yoon, J. Theiler, W. Abfalterer, N. Hengartner, E. E. Giorgi, T. Bhattacharya, B. Foley, K. M. Hastie, M. D. Parker, D. G. Partridge, C.M. Evans, T.M. Freeman, T. I. de Silva, A. Angyal, R. L. Brown, L. Carrilero, L. R. Green, D. C. Groves, K. J. Johnson, A. J. Keeley, B. B. Lindsey, P. J. Parsons, M. Raza, S. Rowland-Jones, N. Smith, R. M. Tucker, D.Wang, M. D. Wyles, C. McDanal, L. G. Perez, H. Tang, A. Moon-Walker, S. P. Whelan, C. C. LaBranche, E. O. Saphire, and D. C. Montefiori. Tracking changes in sars-cov- 2 spike: Evidence that d614g increases infectivity of the covid-19 virus. Cell, 182(4):812--827.e19, 2020.
[20]
S. A. Lauer, K.H. Grantz, Q. Bi, F. K. Jones, Q. Zheng, H. R.Meredith, A. S. Azman, N. G. Reich, and J. Lessler. The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: Estimation and application. Annals of Internal Medicine, 172(9):577--582, 2020.
[21]
C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. Temporal convolutional networks for action segmentation and detection. In CVPR, pages 1003--1012, 2017.
[22]
Q. Li, X. Guan, P. Wu, X. Wang, L. Zhou, Y. Tong, R. Ren, K. S. Leung, E. H. Lau, J. Y. Wong, X. Xing, N. Xiang, Y. Wu, C. Li, Q. Chen, D. Li, T. Liu, J. Zhao, M. Liu,W. Tu, C. Chen, L. Jin, R. Yang, Q.Wang, S. Zhou, R.Wang, H. Liu, Y. Luo, Y. Liu, G. Shao, H. Li, Z. Tao, Y. Yang, Z. Deng, B. Liu, Z. Ma, Y. Zhang, G. Shi, T. T. Lam, J. T. Wu, G. F. Gao, B. J. Cowling, B. Yang, G. M. Leung, and Z. Feng. Early transmission dynamics in wuhan, china, of novel coronavirus--infected pneumonia. New England Journal of Medicine, 382(13):1199--1207, 2020.
[23]
C. Liu, S. C. H. Hoi, P. Zhao, and J. Sun. Online ARIMA algorithms for time series prediction. In AAAI, pages 1867--1873, 2016.
[24]
G. Liu, B. Carter, and D. K. Gifford. Predicted cellular immunity population coverage gaps for sars-cov-2 subunit vaccines and their augmentation by compact peptide sets. Cell systems, 12(1):102--107.e4, Jan 2021.
[25]
L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon. Querybased workload forecasting for self-driving database management systems. In SIGMOD, pages 631--645, 2018.
[26]
Y. Matsubara and Y. Sakurai. Dynamic modeling and forecasting of timeevolving data streams. In A. Teredesai, V. Kumar, Y. Li, R. Rosales, E. Terzi, and G. Karypis, editors, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4--8, 2019, pages 458--468. ACM, 2019.
[27]
Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In Q. Yang, D. Agarwal, and J. Pei, editors, The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12--16, 2012, pages 6--14. ACM, 2012.
[28]
Y. Matsubara, Y. Sakurai, W. G. van Panhuis, and C. Faloutsos. FUNNEL: automatic mining of spatially coevolving epidemics. In S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani, editors, The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014, pages 105--114. ACM, 2014.
[29]
E. Minskaia, T. Hertzig, A. E. Gorbalenya, V. Campanacci, C. Cambillau, B. Canard, and J. Ziebuhr. Discovery of an rna virus 3'->5' exoribonuclease that is critically involved in coronavirus rna synthesis. Proceedings of the National Academy of Sciences, 103(13):5108--5113, 2006.
[30]
J. J. Moré. The levenberg-marquardt algorithm: Implementation and theory. In Numerical Analysis, pages 105--116, 1978.
[31]
H. Nishiura, N. M. Linton, and A. R. Akhmetzhanov. Serial interval of novel coronavirus (covid-19) infections. International Journal of Infectious Diseases, 93:284--286, 2020.
[32]
G. Panagopoulos, G. Nikolentzos, and M. Vazirgiannis. Transfer graph neural networks for pandemic forecasting. In AAAI/IAAI, pages 4838--4845, 2021.
[33]
J. Paparrizos, C. Liu, A. J. Elmore, and M. J. Franklin. Debunking four longstanding misconceptions of time-series distance measures. In SIGMOD, pages 1887--1905, 2020.
[34]
K. Prem, Y. Liu, T. Russell, A. Kucharski, R. Eggo, N. Davies, M. Jit, P. Klepac, S. Flasche, S. Clifford, C. Pearson, J. Munday, S. Abbott, H. Gibbs, A. Rosello, B. Quilty, T. Jombart, F. Sun, C. Diamond, and J. Hellewell. The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study. The Lancet Public Health, 5, 03 2020.
[35]
Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell. A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI, pages 2627--2633, 2017.
[36]
A. Rodríguez, N. Muralidhar, B. Adhikari, A. Tabassum, N. Ramakrishnan, and B. A. Prakash. Steering a historical disease forecasting model under a pandemic: Case of flu and COVID-19. In AAAI/IAAI, pages 4855--4863, 2021.
[37]
M. Rogers, L. Li, and S. J. Russell. Multilinear dynamical systems for tensor time series. In NIPS, pages 2634--2642, 2013.
[38]
Y. Sakurai, Y. Matsubara, and C. Faloutsos. Mining and forecasting of big timeseries data. In T. K. Sellis, S. B.Davidson, and Z. G. Ives, editors, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 919--922. ACM, 2015.
[39]
J. Shaman and A. Karspeck. Forecasting seasonal outbreaks of influenza. Proceedings of the National Academy of Sciences, 109(50):20425--20430, 2012.
[40]
Q. Shi, J. Yin, J. Cai, A. Cichocki, T. Yokota, L. Chen, M. Yuan, and J. Zeng. Block hankel tensor ARIMA for multiple short time series forecasting. In AAAI/IAAI, pages 5758--5766, 2020.
[41]
H. A. Song, B. Hooi, M. Jereminov, A. Pandey, L. T. Pileggi, and C. Faloutsos. Powercast: Mining and forecasting power grid sequences. In PKDD, volume 10535 of Lecture Notes in Computer Science, pages 606--621, 2017.
[42]
A. Taghvaei, J. de Wiljes, P. G. Mehta, and S. Reich. Kalman filter and its modern extensions for the continuous-time nonlinear filtering problem. CoRR, abs/1702.07241, 2017.
[43]
M. Tizzoni, P. Bajardi, C. Poletto, J. J. Ramasco, D. Balcan, B. Gonçalves, N. Perra, V. Colizza, and A. Vespignani. Real-time numerical forecast of global epidemic spreading: Case study of 2009 a/h1n1pdm. BMC medicine, 10:165, 12 2012.
[44]
S. R. Venna, A. Tavanaei, R. N. Gottumukkala, V. V. Raghavan, A. S. Maida, and S. Nichols. A novel data-driven model for real-time influenza forecasting. IEEE Access, 7:7691--7701, 2019.
[45]
Q. Wen, K. He, L. Sun, Y. Zhang, M. Ke, and H. Xu. Robustperiod: Robust timefrequency mining for multiple periodicity detection. In SIGMOD, pages 2328-- 2337, 2021.
[46]
WHO. Coronavirus disease 2019 (covid-19): situation report, 72. 2020.
[47]
WHO. Covid-19 weekly epidemiological update, 27 april 2021. 2021.
[48]
F. Wu, S. Zhao, B. Yu, Y.-M. Chen, W. Wang, Z.-G. Song, Y. Hu, Z.-W. Tao, J.-H. Tian, Y.-Y. Pei,M.-L. Yuan, Y.-L. Zhang, F.-H. Dai, Y. Liu, Q.-M.Wang, J.-J. Zheng, L. Xu, E. C. Holmes, and Y.-Z. Zhang. A new coronavirus associated with human respiratory disease in china. Nature, 579(7798):265--269, Mar 2020.
[49]
C. Xiao, J. Zhou, J. Huang, A. Zhuo, J. Liu, H. Xiong, and D. Dou. C-watcher: A framework for early detection of high-risk neighborhoods ahead of COVID-19 outbreak. In AAAI/IAAI, pages 4892--4900, 2021.
[50]
J. Ye, L. Sun, B. Du, Y. Fu, X. Tong, and H. Xiong. Co-prediction of multiple transportation demands based on deep spatio-temporal neural network. In KDD, pages 305--313, 2019.
[51]
S. Yoon, J. Lee, and B. S. Lee. Ultrafast local outlier detection from a data stream with stationary region skipping. In KDD, pages 1181--1191, 2020.
[52]
S. Yoon, Y. Shin, J. Lee, and B. S. Lee. Multiple dynamic outlier-detection from a data stream by exploiting duality of data and queries. In SIGMOD, pages 2063-- 2075, 2021.
[53]
H. Yuan, G. Li, Z. Bao, and L. Feng. Effective travel time estimation: When historical trajectories over road networks matter. In SIGMOD, pages 2135--2149, 2020.
[54]
C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni, B. Zong, H. Chen, and N. V. Chawla. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In AAAI/IAAI, pages 1409--1416, 2019.
[55]
Z. Zhong, S. Yan, Z. Li, D. Tan, T. Yang, and B. Cui. Burstsketch: Finding bursts in data streams. In SIGMOD, pages 2375--2383, 2021.
[56]
P. Zhou, X.-L. Yang, X.-G. Wang, B. Hu, L. Zhang, W. Zhang, H.-R. Si, Y. Zhu, B. Li, C.-L. Huang, H.-D. Chen, J. Chen, Y. Luo, H. Guo, R.-D. Jiang, M.-Q. Liu, Y. Chen, X.-R. Shen, X.Wang, X.-S. Zheng, K. Zhao, Q.-J. Chen, F. Deng, L.-L. Liu, B. Yan, F.-X. Zhan, Y.-Y.Wang, G.-F. Xiao, and Z.-L. Shi. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 579(7798):270-- 273, Mar 2020.

Cited By

View all
  • (2024)Practical Applications of Online Machine LearningOnline Machine Learning10.1007/978-981-99-7007-0_7(71-96)Online publication date: 6-Feb-2024
  • (2023)Towards utilitarian online learningProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/745(6647-6655)Online publication date: 19-Aug-2023
  • (2023)A Deep Learning Model for Mobility Change Prediction Based on National Prevention and Control Policy2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00364(2607-2614)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. epidemics
  3. non-linear dynamical systems
  4. real-time forecasting
  5. tensor data analysis
  6. time series

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)333
  • Downloads (Last 6 weeks)32
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Practical Applications of Online Machine LearningOnline Machine Learning10.1007/978-981-99-7007-0_7(71-96)Online publication date: 6-Feb-2024
  • (2023)Towards utilitarian online learningProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/745(6647-6655)Online publication date: 19-Aug-2023
  • (2023)A Deep Learning Model for Mobility Change Prediction Based on National Prevention and Control Policy2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00364(2607-2614)Online publication date: 1-Nov-2023
  • (2023)Dynamic Causal Modelling and Predictive Analysis for the COVID-19 Pandemic2023 IEEE International Conference on Intelligence and Security Informatics (ISI)10.1109/ISI58743.2023.10297254(1-6)Online publication date: 2-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media