Extracting Share Frequent Itemsets with Infrequent Subsets

Barber, Brock; Hamilton, Howard J.

doi:10.1023/A:1022419032620

Extracting Share Frequent Itemsets with Infrequent Subsets

Published: April 2003

Volume 7, pages 153–185, (2003)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Brock Barber¹ &
Howard J. Hamilton¹

263 Accesses
51 Citations
Explore all metrics

Abstract

Itemset share has been proposed as an additional measure of the importance of itemsets in association rule mining (Carter et al., 1997). We compare the share and support measures to illustrate that the share measure can provide useful information about numerical values that are typically associated with transaction items, which the support measure cannot. We define the problem of finding share frequent itemsets, and show that share frequency does not have the property of downward closure when it is defined in terms of the itemset as a whole. We present algorithms that do not rely on the property of downward closure, and thus are able to find share frequent itemsets that have infrequent subsets. The algorithms use heuristic methods to generate candidate itemsets. They supplement the information contained in the set of frequent itemsets from a previous pass, with other information that is available at no additional processing cost. They count only those generated itemsets that are predicted to be frequent. The algorithms are applied to a large commercial database and their effectiveness is examined using principles of classifier evaluation from machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, A., Imielinksi, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Int. Conf. on the Management of Data, Washington, D.C., pp. 207–216.
Agrawal, A., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996.Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, (U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy(Eds.)), Menlo Park, California, pp. 307–328.
Agrawal, A. and Schafer, J.C. 1996. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6):962–969.
Google Scholar
Agrawal, A. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. Twentieth Int. Conf. on Very Large Databases, Santiago, Chile, pp. 487–499.
Ali, K., Manganaris, S., and Srikant, R. 1997. Partial classification using association rules. In Proc. Third Int. Conf. on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, pp. 115–118.
Barber, B. and Hamilton, H.J. 2000. Algorithms for mining share frequent item sets containing infrequent subsets. In Proc. Fourth European Conf. on Principles of Knowledge Discovery in Databases, Lyon, France, pp. 316–324.
Barber, B. and Hamilton, H.J. 2001. Parametric algorithms for mining share frequent item sets. Journal of Intelligent Information Systems, 16(3):277–293.
Google Scholar
Bayardo, R.J., Agrawal, R., and Gunopulos, D. 1999. Constraint based rule mining in large dense databases. In Proc. 15th Int. Conf. on Data Engineering, Sydney, Australia, pp. 188–197.
Brin, S., Motwani, R., and Silverstein, C. 1997a. Beyond market baskets: Generalizing association rules to correlations. In Proc. ACM SIGMOD Int. Conf. on the Management of Data, New York, pp. 265–276.
Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. 1997b. Dynamic item set counting and implication rules for market basket data. In Proc. ACM SIGMOD Int. Conf. on the Management of Data, New York, pp. 255–264.
Buchter, O. and Wirth, R. 1998. Discovery of association rules over ordinal data: A new and faster algorithm and its application to market basket data. In Proc. Second Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Melbourne, Australia, pp. 36–47.
Cai, C.H., Fu, A., Cheng, C.H., and Kwong, W.W. 1998. Mining association rules with weighted items. In Proc. of IEEE Int. Database Engineering and Applications Symposium, Cardiff, United Kingdom, pp. 68–77.
Carter, C.L., Hamilton, H.J., and Cercone, N. 1997. Share based measures for item sets. In Proc. First European Conf. on the Principles of Data Mining and Knowledge Discovery, Trondheim, Norway, pp. 14–24.
Cheung, D.W., Ng, V.T., Fu, A.W., and Fu, Y. 1996. Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Engineering, 8(6):911–922.
Google Scholar
Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. 1991. Knowledge discovery in databases: An overview. In Knowledge Discovery in Databases, (G. Piatetsky-Shapiro and W.J. Frawley(Eds.)), Menlo Park: AAAI/MIT Press, pp. 1–27.
Google Scholar
Han, J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. In Proc. Int. Conf. on Very Large Databases, Zurich, Switzerland, pp. 420–431.
Hidber, C. 1999. Online association rule mining. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadephia, Pennsylvania, pp. 145–156.
Hilderman, R.J., Carter, C., Hamilton, H.J., and Cercone, N. 1998. Mining association rules from market basket data using share measures and characterized item sets. Int. J. of Artificial Intelligence Tools, 7(2):189–220.
Google Scholar
Hipp, M., Myka, A., Wirth, R., and Güntzer, U. 1998. A new algorithm for faster mining of generalized association rules. In Proc. Second European Symposium on Principles of Data Mining and Knowledge Discovery, Nantes, France, pp. 74–82.
Kohavi, R. and Provost, F. 1998. Glossary of terms.Machine Learning, 30(2):271–274.
Google Scholar
Koperaki, K. and Han, J. 1995. Discovery of spatial association rules in geographic information databases. In Proc. Fourth Int. Symposium on Large Spatial Databases, Portland, Maine, pp. 47–66.
Kubat, M., Holte, R.C., and Matwin, S. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(1):195–215.
Google Scholar
Lewis, D.D. and Gale, A.G. 1994. A sequential algorithm for training text classifiers. In Proc. 17th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Dublin, Ireland, pp. 3–12.
Lin, T.Y., Yao, Y.Y., and Louie, E. 2002. Value added association rules. In Proc. Sixth Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Taipei, Taiwan, pp. 328–333.
Lu, S., Hu, H., and Li, F. 2001. Mining weighted association rules. Intelligent Data Analysis, 5(3):211–225.
Google Scholar
Mannila, H., Toivonen, H., and Verkamo, A.I. 1994. Efficient algorithms for discovering association rules. In Proc. 1994 AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, pp. 144–155.
Masand, B. and Piatetsky-Shapiro, G. 1996. A comparison of approaches for maximizing business payoff of predictive models. In Proc. Second Int. Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 195–201.
Megiddo, N. and Srikant, R. 1998. Discovering predictive association rules. In Proc. Fourth Int. Conf. on Knowledge Discovery and Data Mining, New York, pp. 274–278.
Park, J.S., Chen, M., and Yu, P. 1995. An effective Hash-based algorithm for mining association rules. In Proc. ACM SIGMOD Int. Conf. on the Management of Data, San Jose, California, pp. 175–186.
Pei, J., Han, J., and Lakshmanan, L.V.S. 2001. Mining frequent itemsets with convertible constraints. In Proc. 2001 Int. Conf. on Data Engineering, Heidelberg, Germany, pp. 433–442.
Provost, F. and Fawcett, T. 1997. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distribution. In Proc. Third Int. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, pp. 43–48.
Provost, F., Fawcett, T., and Kohavi, R. 1998. Building the case against accuracy estimation for comparing induction algorithms. In Proc. Fifteenth Int. Conf. on Machine Learning, Madison, Wisconsin, pp. 445–453.
Silverstein, C., Brin, S., and Motwani, R. 1998. Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1):39–68.
Google Scholar
Srikant, R. and Agrawal, R. 1996. Mining quantitative association rules in large relational tables. In Proc. ACM SOGMOD Conf. on the Management of Data, Montreal, Canada, pp. 1–12.
Swets, J.A. 1988. Measuring the accuracy of diagnostic systems. Science, 240:1285–1293.
Google Scholar
Zaki, M.J., Parthasarathy, M., Ogihara, M., and Li, W. 1997. Newalgorithms for fast discovery of association rules. In Proc. Third Int. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, pp. 283–286.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, 3737 Wascana Parkway, Regina, SK, S4S 0A2, Canada
Brock Barber & Howard J. Hamilton

Authors

Brock Barber
View author publications
You can also search for this author in PubMed Google Scholar
Howard J. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Howard J. Hamilton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barber, B., Hamilton, H.J. Extracting Share Frequent Itemsets with Infrequent Subsets. Data Mining and Knowledge Discovery 7, 153–185 (2003). https://doi.org/10.1023/A:1022419032620

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1023/A:1022419032620

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting Share Frequent Itemsets with Infrequent Subsets

Abstract

Access this article

Similar content being viewed by others

Frequent Itemset

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Extracting Share Frequent Itemsets with Infrequent Subsets

Abstract

Access this article

Similar content being viewed by others

Frequent Itemset

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation