skip to main content
10.1145/1401890.1401938acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Automatic identification of quasi-experimental designs for discovering causal knowledge

Authors Info & Claims
Published:24 August 2008Publication History

ABSTRACT

Researchers in the social and behavioral sciences routinely rely on quasi-experimental designs to discover knowledge from large data-bases. Quasi-experimental designs (QEDs) exploit fortuitous circumstances in non-experimental data to identify situations (sometimes called "natural experiments") that provide the equivalent of experimental control and randomization. QEDs allow researchers in domains as diverse as sociology, medicine, and marketing to draw reliable inferences about causal dependencies from non-experimental data. Unfortunately, identifying and exploiting QEDs has remained a painstaking manual activity, requiring researchers to scour available databases and apply substantial knowledge of statistics. However, recent advances in the expressiveness of databases, and increases in their size and complexity, provide the necessary conditions to automatically identify QEDs. In this paper, we describe the first system to discover knowledge by applying quasi-experimental designs that were identified automatically. We demonstrate that QEDs can be identified in a traditional database schema and that such identification requires only a small number of extensions to that schema, knowledge about quasi-experimental design encoded in first-order logic, and a theorem-proving engine. We describe several key innovations necessary to enable this system, including methods for automatically constructing appropriate experimental units and for creating aggregate variables on those units. We show that applying the resulting designs can identify important causal dependencies in real domains, and we provide examples from academic publishing, movie making and marketing, and peer-production systems. Finally, we discuss the integration of QEDs with other approaches to causal discovery, including joint modeling and directed experimentation.

Skip Supplemental Material Section

Supplemental Material

p372-jensen_400h.mov

mov

96 MB

References

  1. Armour, S. and Haynie, D. 2007. Adolescent sexual debut and later delinquency. Journal of Youth and Adolescence. 36, 2, 141--152.Google ScholarGoogle ScholarCross RefCross Ref
  2. Barker, R. 1990. CASE*Method: Entity Relationship Modelling. Addison-Wesley, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bradshaw, G., Langley, P., and Simon, H. 1983. Studying scientific discovery by computer simulation. Science, 222, 4627, 971--975.Google ScholarGoogle Scholar
  4. Campbell, D. and Stanley, J. 1963. Experimental and Quasi-Experimental Designs for Research. Rand McNally.Google ScholarGoogle Scholar
  5. Cook, T. and Campbell, T. 1979. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Rand McNally.Google ScholarGoogle Scholar
  6. Chen, P. 1976. The entity-relationship model - Toward a unified view of data. ACM Transactions on Database Systems 1, 1, 9--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cochran, W. and Cox, G. 1954. Experimental Designs. Wiley, New York.Google ScholarGoogle Scholar
  8. Harden, K., Mendle, J., Hill, J., Turkheimer, E., and Emery, R. 2008. Rethinking timing of first sex and delinquency. Journal of Youth and Adolescence 37, 4, 373--385.Google ScholarGoogle ScholarCross RefCross Ref
  9. Holland, P. 1986. Statistics and causal inference. Journal of the American Statistical Association. 81, 396, 945--960.Google ScholarGoogle Scholar
  10. Holland, P. and Rubin, D. 1988. Causal inference in retrospective studies. Evaluation Review 12, 203--231.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jensen, D. 2008. Beyond prediction: Directions for probabilistic and relational learning. Lecture Notes in Computer Science 4894, 4--21. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karimi, K. and Hamilton, H. 2003. Distinguishing causal and acausal temporal relations. The Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'2003). Seoul, South Korea, 234--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. King, R., Whelan, K., Jones, F., Reiser, P., Bryant, C., Muggleton, S., Kell, D., and Oliver, S. 2004. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 6971, 247--252.Google ScholarGoogle Scholar
  14. Kulkarni, D. and Simon, H. 1988. The processes of scientific discovery: The strategy of experimentation. Cognitive Science 12, 139--176.Google ScholarGoogle ScholarCross RefCross Ref
  15. Langley, P. 1981. Data-driven discovery of physical laws. Cognitive Science 5, 1, 31--54Google ScholarGoogle ScholarCross RefCross Ref
  16. Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Richardson, M. and Domingos, P. 2003. Building large knowledge bases by mass collaboration. Proceedings of the 2nd international conference on Knowledge capture. 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rubin, D. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 66, 5, 689.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shadish, W., Cook, T., and Campbell, D. 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston, MA.Google ScholarGoogle Scholar
  20. Spirtes, P., Glymour, C., and Scheines, R. 2000. Causation, Prediction, and Search. MIT Press, Cambridge.Google ScholarGoogle Scholar
  21. UNC Carolina Population Center. 2008. Add Health Home Page. http://www.cpc.unc.edu/addhealth. Accessed on February 27, 2008.Google ScholarGoogle Scholar
  22. Weiss, R. 2007. Study debunks theory on teen sex, delinquency. Washington Post. November 11, 2007, A03.Google ScholarGoogle Scholar

Index Terms

  1. Automatic identification of quasi-experimental designs for discovering causal knowledge

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2008
      1116 pages
      ISBN:9781605581934
      DOI:10.1145/1401890
      • General Chair:
      • Ying Li,
      • Program Chairs:
      • Bing Liu,
      • Sunita Sarawagi

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader