Scalable Techniques for Mining Causal Structures

Silverstein, Craig; Brin, Sergey; Motwani, Rajeev; Ullman, Jeff

doi:10.1023/A:1009891813863

Scalable Techniques for Mining Causal Structures

Published: July 2000

Volume 4, pages 163–192, (2000)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Craig Silverstein¹,
Sergey Brin²,
Rajeev Motwani³ &
…
Jeff Ullman⁴

840 Accesses
98 Citations
Explore all metrics

Abstract

Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form “the existence of item A implies the existence of item B.” However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in large-scale data mining applications with a large number of variables.

In this paper, we consider the problem of determining casual relationships, instead of mere associations, when mining market basket data. We identify some problems with the direct application of Bayesian learning ideas to mining large databases, concerning both the scalability of algorithms and the appropriateness of the statistical techniques, and introduce some initial ideas for dealing with these problems. We present experimental results from applying our algorithms on several large, real-world data sets. The results indicate that the approach proposed here is both computationally feasible and successful in identifying interesting causal structures. An interesting outcome is that it is perhaps easier to infer the lack of causality than to infer causality, information that is useful in preventing erroneous decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pp. 207–216.
Agresti, A. 1990. Categorical Data Analysis. New York: John Wiley & Sons.
Google Scholar
Balke, A. and Pearl, J. 1994. Probabilistic evaluation of counterfactual queries. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pp. 46–54.
Brin, S., Motwani, R., and Silverstein, C. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the 1997 ACM SIGMOD Conference on Management of Data, Tucson, AZ, pp. 265–276.
Cooper, G. 1997. A simple constraint-based algorithm for efficiently mining observational databases for causal relationships. Data Mining and Knowledge Discovery, 2:203–224.
Article Google Scholar
Cooper, G. and Herskovits, E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.
Google Scholar
Heckerman, D. 1995.ABayesian approach to learning causal networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 285–295.
Heckerman, D. 1997. Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1:79–119.
Article Google Scholar
Heckerman, D., Geiger, D., and Chickering, D. 1994. Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 293–301.
Heckerman, D., Meek, C., and Cooper, G. 1997. A Bayesian approach to causal discovery. Technical Report MSR-TR-97-05, Microsoft Research.
Mendenhall, W., Scheaffer, R., and Wackerly, D. 1986. Mathematical Statistics with Applications. Duxbury Press, 3rd edition.
Pearl, J. 1993. From Bayesian networks to causal networks. In Proceedings of the Adaptive Computing and Information Processing Seminar, pp. 25–27
Pearl, J. 1995. Causal diagrams for empirical research. Biometrika, 82:669–709.
Article MATH MathSciNet Google Scholar
Pearl, J. and Verma, T.S. 1991. A theory of inferred causation. In Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning, pp. 441–452.
Spirtes, P., Glymour, C., and Scheines, R. 1993. Causation, Prediction, and Search. New York: Springer-Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Craig Silverstein
Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Sergey Brin
Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Rajeev Motwani
Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Jeff Ullman

Authors

Craig Silverstein
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Brin
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Motwani
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Ullman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silverstein, C., Brin, S., Motwani, R. et al. Scalable Techniques for Mining Causal Structures. Data Mining and Knowledge Discovery 4, 163–192 (2000). https://doi.org/10.1023/A:1009891813863

Download citation

Issue Date: July 2000
DOI: https://doi.org/10.1023/A:1009891813863

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable Techniques for Mining Causal Structures

Abstract

Access this article

Similar content being viewed by others

Machine learning for financial forecasting, planning and analysis: recent developments and pitfalls

A comprehensive survey of data mining

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Scalable Techniques for Mining Causal Structures

Abstract

Access this article

Similar content being viewed by others

Machine learning for financial forecasting, planning and analysis: recent developments and pitfalls

A comprehensive survey of data mining

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation