ABSTRACT
Data-driven societal event forecasting methods exploit relevant historical information to predict future events. These methods rely on historical labeled data and cannot accurately predict events when data are limited or of poor quality. Studying causal effects between events goes beyond correlation analysis and can contribute to a more robust prediction of events. However, incorporating causality analysis in data-driven event forecasting is challenging due to several factors: (i) Events occur in a complex and dynamic social environment. Many unobserved variables, i.e., hidden confounders, affect both potential causes and outcomes. (ii) Given spatiotemporal non-independent and identically distributed (non-IID) data, modeling hidden confounders for accurate causal effect estimation is not trivial. In this work, we introduce a deep learning framework that integrates causal effect estimation into event forecasting. We first study the problem of Individual Treatment Effect (ITE) estimation from observational event data with spatiotemporal attributes and present a novel causal inference model to estimate ITEs. We then incorporate the learned event-related causal information into event prediction as prior knowledge. Two robust learning modules, including a feature reweighting module and an approximate constraint loss, are introduced to enable prior knowledge injection. We evaluate the proposed causal inference model on real-world event datasets and validate the effectiveness of proposed robust learning modules in event prediction by feeding learned causal information into different deep learning methods. Experimental results demonstrate the strengths of the proposed causal inference model for ITE estimation in societal events and showcase the beneficial properties of robust learning modules in societal event forecasting.
Supplemental Material
- Harshavardhan Achrekar, Avinash Gandhe, Ross Lazarus, Ssu-Hsin Yu, and Benyuan Liu. 2011. Predicting flu trends using twitter data. In IEEE Conference on Computer Communications Workshops. IEEE, 702--707.Google ScholarCross Ref
- Andrew Anglemyer, Hacsi T Horvath, and Lisa Bero. 2014. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database of Systematic Reviews 4 (2014).Google Scholar
- Peter W Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, and Koray Kavukcuoglu. 2016. Interaction networks for learning about objects, relations and physics. arXiv:1612.00222 (2016).Google Scholar
- Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market. Journal of computational science, Vol. 2, 1 (2011), 1--8.Google ScholarCross Ref
- Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. In RecSys. 104--112.Google Scholar
- Elizabeth Boschee, Jennifer Lautenschlager, Sean O'Brien, Steve Shellman, James Starz, and Michael Ward. 2015. ICEWS Coded Event Data.Google Scholar
- Jin Chen, Xinxiao Wu, Yao Hu, and Jiebo Luo. 2021. Spatial-temporal Causal Inference for Partial Image-to-video Adaptation. In AAAI, Vol. 35. 1027--1035.Google ScholarCross Ref
- Hugh A Chipman, Edward I George, Robert E McCulloch, et al. 2010. BART: Bayesian additive regression trees. The Annals of Applied Statistics, Vol. 4, 1 (2010), 266--298.Google ScholarCross Ref
- Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In ICML. PMLR, 933--941.Google Scholar
- Songgaojun Deng, Huzefa Rangwala, and Yue Ning. 2019. Learning Dynamic Context Graphs for Predicting Social Events. In KDD. ACM, 1007--1016.Google Scholar
- Songgaojun Deng, Huzefa Rangwala, and Yue Ning. 2020 a. Dynamic Knowledge Graph Based Multi-Event Forecasting. Association for Computing Machinery, New York, NY, USA.Google Scholar
- Songgaojun Deng, Huzefa Rangwala, and Yue Ning. 2021. Understanding Event Predictions via Contextualized Multilevel Feature Learning. In CIKM. 342--351.Google Scholar
- Songgaojun Deng, Shusen Wang, Huzefa Rangwala, Lijing Wang, and Yue Ning. 2020 b. Cola-GNN: Cross-location Attention based Graph Neural Networks for Long-term ILI Prediction. In CIKM. 245--254.Google Scholar
- Michelangelo Diligenti, Soumali Roychowdhury, and Marco Gori. 2017. Integrating prior knowledge into deep learning. In ICMLA. IEEE, 920--923.Google Scholar
- Matthew S Gerber. 2014. Predicting crime using Twitter and kernel density estimation. Decision Support Systems, Vol. 61 (2014), 115--125.Google ScholarCross Ref
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS.Google Scholar
- Ruocheng Guo, Jundong Li, and Huan Liu. 2019. Learning individual treatment effects from networked observational data. arXiv:1906.03485 (2019).Google Scholar
- Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In ICML. 3020--3029.Google Scholar
- Nathan Kallus. 2014. Predicting crowd behavior with big public data. In WWW. 625--630.Google Scholar
- Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. 2019. Few-shot object detection via feature reweighting. In ICCV. 8420--8429.Google Scholar
- D Kinga and J Ba Adam. 2015. A method for stochastic optimization. In ICLR, Vol. 5.Google Scholar
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.Google Scholar
- Kalev Leetaru and Philip A Schrodt. 2013. Gdelt: Global data on events, location, and tone, 1979--2012. In ISA annual convention, Vol. 2. Citeseer, 1--49.Google Scholar
- Jia Li, Xiaowei Jia, Haoyu Yang, Vipin Kumar, Michael Steinbach, and Gyorgy Simon. 2020. Teaching deep learning causal effects improves predictive performance. arXiv:2011.05466 (2020).Google Scholar
- Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal effect inference with deep latent-variable models. In NIPS. 6446--6456.Google Scholar
- Jing Ma, Ruocheng Guo, Chen Chen, Aidong Zhang, and Jundong Li. 2021. Deconfounding with Networked Observational Data in a Dynamic Environment. In WSDM (Virtual Event, Israel) (WSDM '21). 166--174.Google Scholar
- Nikhil Muralidhar, Mohammad Raihanul Islam, Manish Marwah, Anuj Karpatne, and Naren Ramakrishnan. 2018. Incorporating prior domain knowledge into deep neural networks. In ICBD. IEEE, 36--45.Google Scholar
- Yue Ning, Sathappan Muthiah, Huzefa Rangwala, and Naren Ramakrishnan. 2016. Modeling precursors for event forecasting via nested multi-instance learning. In KDD. ACM, 1095--1104.Google Scholar
- Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499 (2016).Google Scholar
- Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statistics surveys, Vol. 3 (2009), 96--146.Google Scholar
- Kira Radinsky, Sagie Davidovich, and Shaul Markovitch. 2012. Learning causality for news events prediction. In WWW. 909--918.Google Scholar
- Kira Radinsky and Eric Horvitz. 2013. Mining the web to predict future events. In WSDM. 255--264.Google Scholar
- Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, Vol. 70, 1 (1983), 41--55.Google ScholarCross Ref
- Donald B Rubin. 2005. Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc., Vol. 100, 469 (2005), 322--331.Google ScholarCross Ref
- Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment effect: generalization bounds and algorithms. In ICML. JMLR. org, 3076--3085.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.Google Scholar
- Laura von Rueden, Sebastian Mayer, Katharina Beckh, Bogdan Georgiev, Sven Giesselbach, Raoul Heese, Birgit Kirsch, Julius Pfrommer, Annika Pick, Rajkumar Ramamurthy, et al. 2019. Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems. arXiv:1903.12394 (2019).Google Scholar
- Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J Lim. 2019. Multimodal model-agnostic meta-learning via task-aware modulation. arXiv:1910.13616 (2019).Google Scholar
- Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc., Vol. 113, 523 (2018), 1228--1242.Google ScholarCross Ref
- Xiaofeng Wang, Donald E Brown, and Matthew S Gerber. 2012a. Spatio-temporal modeling of criminal incidents using geographic, demographic, and Twitter-derived information. In ISI. IEEE, 36--41.Google Scholar
- Xiaofeng Wang, Matthew S Gerber, and Donald E Brown. 2012b. Automatic crime prediction using events extracted from twitter posts. In International conference on social computing, behavioral-cultural modeling, and prediction. Springer, 231--238.Google ScholarDigital Library
- Yixin Wang and David M Blei. 2019. The blessings of multiple causes. J. Amer. Statist. Assoc. just-accepted (2019), 1--71.Google Scholar
- Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. 2019. Graph wavenet for deep spatial-temporal graph modeling. arXiv:1906.00121 (2019).Google Scholar
- Liu Yang and Rong Jin. 2006. Distance metric learning: A comprehensive survey. Michigan State Universiy, Vol. 2, 2 (2006), 4.Google Scholar
- Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018. Representation learning for treatment effect estimation from observational data. NIPS, Vol. 31 (2018).Google Scholar
- Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122 (2015).Google Scholar
- Liang Zhao, Feng Chen, Jing Dai, Ting Hua, Chang-Tien Lu, and Naren Ramakrishnan. 2014. Unsupervised spatial event detection in targeted domains with applications to civil unrest modeling. PloS one, Vol. 9, 10 (2014).Google Scholar
- Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2015a. Spatiotemporal event forecasting in social media. In SIAM. SIAM, 963--971.Google Scholar
- Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2015b. Multi-task learning for spatio-temporal event forecasting. In KDD. ACM, 1503--1512.Google Scholar
Index Terms
- Robust Event Forecasting with Spatiotemporal Confounder Learning
Recommendations
Civil Unrest Event Forecasting Using Graphical and Sequential Neural Networks
Artificial Neural Networks and Machine Learning – ICANN 2021AbstractHaving the ability to forecast civil unrest events, such as violent protests, is crucial because they can lead to severe violent conflict and social instabilities. Civil unrests are comprehensive consequences of multiple factors, which could be ...
A MIML-LSTM neural network for integrated fine-grained event forecasting
ICBDT '18: Proceedings of the 1st International Conference on Big Data TechnologiesSocietal event forecasting plays a significant role in crisis warning and emergency management. Most traditional prediction methods focus on predicting whether specific events would happen or not. However, the results of these methods are not always ...
Spatiotemporal Event Forecasting from Incomplete Hyper-local Price Data
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementHyper-local pricing data, e.g., about foods and commodities, exhibit subtle spatiotemporal variations that can be useful as crucial precursors of future events. Three major challenges in modeling such pricing data include: i) temporal dependencies ...
Comments