skip to main content
10.1145/3318464.3389759acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Causal Relational Learning

Published: 31 May 2020 Publication History

Abstract

Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials ; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CARL for capturing causal background knowledge and assumptions, and specifying causal queries using simple Datalog-like rules. CARL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CARL in social sciences and healthcare.

Supplementary Material

MP4 File (3318464.3389759.mp4)
Presentation Video

References

[1]
Agency for Healthcare Research and Quality. NIS data elements: Bedsize Categories.
[2]
Joshua D Angrist and Jörn-Steffen Pischke. Mostly harmless econometrics: An empiricist's companion. Princeton university press, 2008.
[3]
David T. Arbour, Dan Garant, and David D. Jensen. Inferring network effects from observational data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 715--724, 2016.
[4]
Peter M. Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment. Ann. Appl. Stat., 11(4):1912--1947, 12 2017.
[5]
Andrey Balmin, Thanos Papadimitriou, and Yannis Papakonstantinou. Hypothetical queries in an olap environment. In VLDB, volume 220, page 231, 2000.
[6]
Abhijit V Banerjee, Abhijit Banerjee, and Esther Duflo. Poor economics: A radical rethinking of the way to fight global poverty. Public Affairs, 2011.
[7]
Daniel Deutch, Zachary G Ives, Tova Milo, and Val Tannen. Caravan: Provisioning for what-if analysis. In CIDR, 2013.
[8]
Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press, 2007.
[9]
Monica Giancotti, Annamaria Guglielmo, and Marianna Mauro. Efficiency and optimal size of hospitals: Results of a systematic search. PLOS ONE, 12(3):e0174533, March 2017.
[10]
Bryan S Graham, Guido W Imbens, and Geert Ridder. Measuring the effects of segregation in the presence of social spillovers: A nonparametric approach. Technical report, National Bureau of Economic Research, 2010.
[11]
Xing Sam Gu and Paul R Rosenbaum. Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 2(4):405--420, 1993.
[12]
M Elizabeth Halloran and Michael G Hudgens. Causal inference for vaccine effects on infectiousness. The International Journal of Biostatistics, 8(2):1--40, 2012.
[13]
M Elizabeth Halloran and Claudio J Struchiner. Causal inference in infectious diseases. Epidemiology, 6(2):142--151, 1995.
[14]
Healthcare Cost and Utilization Project (HCUP). HCUP Nationwide Inpatient Sample (NIS), 2006.
[15]
Daniel E Ho, Kosuke Imai, Gary King, and Elizabeth A Stuart. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political analysis, 15(3):199--236, 2007.
[16]
Paul W. Holland. Statistics and causal inference. Journal of the American Statistical Association, 81(396):pp. 945--960, 1986.
[17]
Guanglei Hong and Stephen W Raudenbush. Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101(475):901--910, 2006.
[18]
Stefano M Iacus, Gary King, Giuseppe Porro, et al. Cem: software for coarsened exact matching. Journal of Statistical Software, 30(9):1--27, 2009.
[19]
Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.
[20]
Laks VS Lakshmanan, Alex Russakovsky, and Vaishnavi Sashikanth. What-if olap queries with changing dimensions. In International Conference on Data Engineering, pages 1334--1336. IEEE, 2008.
[21]
Sanghack Lee and Vasant Honavar. On learning causal models from relational data. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[22]
Marc Maier, Katerina Marazopoulou, David Arbour, and David Jensen. A sound and complete algorithm for learning causal models from relational data. arXiv preprint arXiv:1309.6843, 2013.
[23]
Marc Maier, Katerina Marazopoulou, and David Jensen. Reasoning about independence in probabilistic models of relational data. arXiv preprint arXiv:1302.4381, 2013.
[24]
Marc Maier, Brian Taylor, Huseyin Oktay, and David Jensen. Learning causal models of relational domains. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
[25]
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. The complexity of causality and responsibility for query answers and non-answers. Proc. VLDB Endow. (PVLDB), 4(1):34--45, 2010.
[26]
Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, and Dan Suciu. Tracing data errors with view-conditioned causality. In ACM SIGMOD International Conference on Management of data, pages 505--516, 2011.
[27]
Alexandra Meliou, Sudeepa Roy, and Dan Suciu. Causality and explanations in databases. Proceedings of the VLDB Endowment, 7(13):1715--1716, 2014.
[28]
Alexandra Meliou and Dan Suciu. Tiresias: the database oracle for how-to queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 337--348. ACM, 2012.
[29]
Elizabeth L Ogburn, Ilya Shpitser, and Youjin Lee. Causal inference, social networks, and chain graphs. arXiv preprint arXiv:1812.04990, 2018.
[30]
Elizabeth L Ogburn, Oleg Sofrygin, Ivan Diaz, and Mark J van der Laan. Causal inference for social network data. arXiv preprint arXiv:1705.08527, 2017.
[31]
Elizabeth L Ogburn, Tyler J VanderWeele, et al. Causal diagrams for interference. Statistical science, 29(4):559--578, 2014.
[32]
Kanu Okike, Kevin T Hug, Mininder S Kocher, and Seth S Leopold. Single-blind vs double-blind peer review in the setting of author prestige. JAMA, 316(12):1315--1316, 2016.
[33]
OpenReview. https://openreview.net.
[34]
Harsh Parikh, Cynthia Rudin, and Alexander Volfovsky. Malts: Matching after learning to stretch. arXiv preprint arXiv:1811.07415, 2018.
[35]
Judea Pearl. Causality: models, reasoning, and inference. Cambridge University Press, 2000.
[36]
Judea Pearl. Causality. Cambridge University Press, 2009.
[37]
Judea Pearl et al. Causal inference in statistics: An overview. Statistics Surveys, 3:96--146, 2009.
[38]
Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic Books, 2018.
[39]
Shanghai University Ranking. http://www.shanghairanking.com.
[40]
Michael E. Rose and John R. Kitchin. pybliometrics: Scriptable bibliometrics using a Python interface to Scopus. SoftwareX, 10:100263, July 2019.
[41]
Joseph S Ross, Cary P Gross, Mayur M Desai, Yuling Hong, Augustus O Grant, Stephen R Daniels, Vladimir C Hachinski, Raymond J Gibbons, Timothy J Gardner, and Harlan M Krumholz. Effect of blinded peer review on abstract acceptance. JAMA, 295(14):1675--1680, 2006.
[42]
Sudeepa Roy and Dan Suciu. A formal approach to finding explanations for database queries. ACM SIGMOD International Conference on Management of Data, 2014.
[43]
Donald B Rubin. The Use of Matched Sampling and Regression Adjustment in Observational Studies. Ph.D. Thesis, Department of Statistics, Harvard University, Cambridge, MA, 1970.
[44]
Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322--331, 2005.
[45]
Donald B Rubin. Matched sampling for causal effects. Cambridge University Press, 2006.
[46]
Donald B Rubin et al. For objective causal inference, design trumps analysis. The Annals of Applied Statistics, 2(3):808--840, 2008.
[47]
Babak Salimi, Leopoldo Bertossi, Dan Suciu, and Guy Van den Broeck. Quantifying causal effects on query answering in databases. In TaPP, 2016.
[48]
Babak Salimi and Leopoldo E. Bertossi. From causes for database queries to repairs and model-based diagnosis and back. In International Conference on Database Theory, pages 342--362, 2015.
[49]
Babak Salimi, Harsh Parikh, Moe Kayali, Sudeepa Roy, Lise Getoor, and Dan Suciu. Causal relational learning. arXiv e-prints, arXiv:2004.03644, https://arxiv.org/abs/2004.03644, 2020.
[50]
Scopus. https://www.scopus.com/.
[51]
Cosma Rohilla Shalizi and Andrew C Thomas. Homophily and contagion are generically confounded in observational social network studies. Sociological methods & research, 40(2):211--239, 2011.
[52]
Richard Snodgrass. Single- versus double-blind reviewing: an analysis of the literature. ACM Sigmod Record, 35(3):8--21, 2006.
[53]
Michael E Sobel. What do randomized studies of housing mobility demonstrate? causal inference in the face of interference. Journal of the American Statistical Association, 101(476):1398--1407, 2006.
[54]
Eric J Tchetgen Tchetgen and Tyler J VanderWeele. On causal inference in the presence of interference. Statistical methods in medical research, 21(1):55--75, 2012.
[55]
Andrew Tomkins, Min Zhang, and William D Heavlin. Reviewer bias in single-versus double-blind peer review. Proceedings of the National Academy of Sciences, 114(48):12708--12713, 2017.
[56]
Tyler J VanderWeele and Eric J Tchetgen Tchetgen. Bounding the infectiousness effect in vaccine trials. Epidemiology, 22(5):686, 2011.

Cited By

View all
  • (2025)Toward Interpretable Hybrid AI: Integrating Knowledge Graphs and Symbolic Reasoning in MedicineIEEE Access10.1109/ACCESS.2025.352913313(39489-39509)Online publication date: 2025
  • (2025)Inferring individual direct causal effects under heterogeneous peer influenceMachine Learning10.1007/s10994-024-06729-2114:4Online publication date: 6-Mar-2025
  • (2024)Causal Dataset Discovery with Large Language ModelsProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics10.1145/3665939.3665968(1-8)Online publication date: 18-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. causal inference
  3. declarative language
  4. experiments
  5. graphical causal models
  6. relational data

Qualifiers

  • Research-article

Funding Sources

  • NIH
  • NSF

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)8
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Toward Interpretable Hybrid AI: Integrating Knowledge Graphs and Symbolic Reasoning in MedicineIEEE Access10.1109/ACCESS.2025.352913313(39489-39509)Online publication date: 2025
  • (2025)Inferring individual direct causal effects under heterogeneous peer influenceMachine Learning10.1007/s10994-024-06729-2114:4Online publication date: 6-Mar-2025
  • (2024)Causal Dataset Discovery with Large Language ModelsProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics10.1145/3665939.3665968(1-8)Online publication date: 18-Jun-2024
  • (2024)Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular DataProceedings of the ACM on Management of Data10.1145/36549572:3(1-28)Online publication date: 30-May-2024
  • (2024)Summarized Causal Explanations For Aggregate ViewsProceedings of the ACM on Management of Data10.1145/36393282:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Causal Graph Representation Learning for Outcome-Oriented Link Prediction2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651266(1-8)Online publication date: 30-Jun-2024
  • (2024)CauseKG: A Framework Enhancing Causal Inference With Implicit Knowledge Deduced From Knowledge GraphsIEEE Access10.1109/ACCESS.2024.339513412(61810-61827)Online publication date: 2024
  • (2024)SemMatch: Semantics-Aware Matching for Causal Inference over Knowledge GraphsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0567-5_33(467-483)Online publication date: 3-Dec-2024
  • (2023)MINT: Detecting Fraudulent Behaviors from Time-Series Relational DataProceedings of the VLDB Endowment10.14778/3611540.361155116:12(3610-3623)Online publication date: 12-Sep-2023
  • (2023)Causal Data IntegrationProceedings of the VLDB Endowment10.14778/3603581.360360216:10(2659-2665)Online publication date: 1-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media