Frequent Itemset Counting Across Multiple Tables

Jensen, Viviane Crestana; Soparkar, Nandit

doi:10.1007/3-540-45571-X_8

Viviane Crestana Jensen⁴ &
Nandit Soparkar⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1763 Accesses
18 Citations

Abstract

Available technology for mining data usually applies to centrally stored data (i.e., homogeneous, and in one single repository and schema). The few extensions to mining algorithms for decentralized data have largely been for load balancing. In this paper, we examine mining decentralized data for the task of finding frequent itemsets. In contrast to current techniques where data is first joined to form a single table, we exploit the inter-table foreign key relationships to obtain decentralized algorithms that execute concurrently on the separate tables, and thereafter, merge the results. In particular, for typical warehouse schema designs, our approach adapts standard algorithms, and works efficiently. We provide analyses and empirical validation for important cases to exhibit how our approach performs well. In doing so, we also compare two of our approaches in merging results from individual tables, and thereby, we exhibit certain memory vs I/O trade-offs that are inherent in merging of decentralized partial results.

This work was initiated when the authors were visiting IBM T.J. Watson Research Center, and was supported partially by IBM Research funds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. C. Agarwal, C. C. Aggarwal, V. V. V. Prasad, and V. Crestana. A tree projection algorithm for generation of large itemsets for association rules. IBM Research Report: RJ 21246. IBM Research Division, New York, 1998.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of ACM-SIGMOD Int’l Conference on Management of Data, 1993.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. of 20th Int’l Conference on Very Large Data Bases, 1994.
Google Scholar
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. of ACM-SIGMOD Int’l Conference on Management of Data, 1997.
Google Scholar
D. Cheung, V. Ng, A. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge & Data Engineering, 1996.
Google Scholar
V. Crestana and N. Soparkar. Mining decentralized data repositories. Tech Report: CSE-TR-385-99. The University of Michigan, Ann Arbor. February 1999.
Google Scholar
B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In Proc. of 15th IEEE Int’l Conference on Data Engineering, 1999.
Google Scholar
V. C. Jensen and N. Soparkar. Algebra-based optimization strategies for decentralized mining. Tech Report: CSE-TR-418-99. The University of Michigan, Ann Arbor. December 1999.
Google Scholar
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of 4th Int’l Conference on Knowledge Discovery & Data Mining, 1998.
Google Scholar
J. S. Park, M-S Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. In Proc. of ACM-SIGMOD Int’l Conference on Management of Data, 1995.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. of 21st Int’l Conference on Very Large Data Bases, 1995.
Google Scholar
Star Schemas and Starjoin Technology. A Red Brick Systems White Paper. 1995.
Google Scholar
A. Silberschatz, H. F. Korth, and S. Sudarshan. Database Systems Concepts. Me Graw Hill, third edition, 1996.
Google Scholar
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proc. of ACM-SIGMOD Int’l Conference on Management of Data, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI, 48109-2122
Viviane Crestana Jensen & Nandit Soparkar

Authors

Viviane Crestana Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Nandit Soparkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Systems Management, Universiy of Tsukuba, 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Takao Terano
Department of Computer Science and Engineering, Arizona State University, P.O. Box 875 406, Tempe, AZ, 85287-5406
Huan Liu
Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan ROC
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jensen, V.C., Soparkar, N. (2000). Frequent Itemset Counting Across Multiple Tables. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_8

Download citation

DOI: https://doi.org/10.1007/3-540-45571-X_8
Published: 24 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics