Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Silverstein, Craig; Brin, Sergey; Motwani, Rajeev

doi:10.1023/A:1009713703947

Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Published: January 1998

Volume 2, pages 39–68, (1998)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Craig Silverstein¹,
Sergey Brin¹ &
Rajeev Motwani¹

779 Accesses
220 Citations
Explore all metrics

Abstract

One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B.” Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm's effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time Series Clustering: A Superior Alternative for Market Basket Analysis

Introduction

A Bayesian Network Model for Interesting Itemsets

References

R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer, and R. Srikant. The Quest Data Mining System. In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data, August 1996.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 207–216, May 1993.
R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5:914–925, 1993.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast Discovery of Association Rules. In Fayyad et al (Fayyad et al., 1996), pages 307–328, 1996.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 487–499, September 1994.
A. Agresti. A survey of exact inference for contingency tables. Statistical Science, 7:131-177, 1992.
Google Scholar
M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Meyer auf der Heide, H. Rohnert, and R. Tarjan. Dynamic perfect hashing: Upper and lower bounds. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, pages 524–531, 1988.
R. Ewald. Keynote address. The 3rd International Conference on Information and Knowledge Management, 1994.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthrusamy. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
M. Fredman, J. Komlós, and E. Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538–544, 1984.
Google Scholar
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Mining Optimized Association Rules for Numeric Attributes. In Proceedings of the Fifteenth ACM Symposium on Principles of Database Systems, 1996.
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Mining optimized association rules for numeric data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 13-24, 1996.
D. Gunopulos, H. Mannila, and S. Saluja. Discovering all most specific sentences by randomized algorithms. In Proceedings of the 6th International Conference on Database Theory, to appear, January 1997.
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases, pages 420–431, September 1995.
M. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of the International Conference on Data Engineering, pages 25–34, 1995.
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the 3rd International Conference on Information and Knowledge Management, pages 401–407, 1994.
H.O. Lancaster. The Chi-squared Distribution. John Wiley & Sons, New York, 1969.
Google Scholar
P. de Laplace. Oeuvres complétes de Laplace publiées sous les auspices de l'Académie des Sciences par M.M. les secrétaires perpétuels. Gauthier-Villar, Paris, 1878/1912.
Google Scholar
H. Mannila, H. Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 144–155, July 1994.
A. de Moivre. Approximatio ad summam terminorum binomii (a + b)ⁿ in seriem expansi. Supplement to Miscellanea Analytica, London, 1733.
D. S. Moore. Tests of chi-squared type. In: R.B. D'Agostino and M.A. Stephens (eds), Goodness-of-Fit Techniques, Marcel Dekker, New York, 1986, pp. 63–95.
Google Scholar
F. Mosteller and D. Wallace. Inference and Disputed Authorship: The Federalists. Addison-Wesley, 1964.
J. S. Park, M. S. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 175–186, May 1995.
K. Pearson. On a criterion that a given system of deviations from the probable in the case of correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag., 5:157–175, 1900.
Google Scholar
G. Piatetsky and W. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases, pages 432–444, 1995.
R. Srikant and R. Agrawal. Mining generalized association rules. In Proceedings of the 21st International Conference on Very Large Data Bases, pages 407–419, September 1995.
H. Toivonen. Sampling large databases for finding association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases, pages 134–145, September 1996.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, 94305
Craig Silverstein, Sergey Brin & Rajeev Motwani

Authors

Craig Silverstein
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Brin
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Motwani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silverstein, C., Brin, S. & Motwani, R. Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2, 39–68 (1998). https://doi.org/10.1023/A:1009713703947

Download citation

Issue Date: January 1998
DOI: https://doi.org/10.1023/A:1009713703947

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Abstract

Access this article

Similar content being viewed by others

Time Series Clustering: A Superior Alternative for Market Basket Analysis

Introduction

A Bayesian Network Model for Interesting Itemsets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Abstract

Access this article

Similar content being viewed by others

Time Series Clustering: A Superior Alternative for Market Basket Analysis

Introduction

A Bayesian Network Model for Interesting Itemsets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation