Mining Frequent Closed Itemsets from Distributed Repositories

Lucchese, Claudio; Orlando, Salvatore; Perego, Raffaele; Silvestri, Claudio

doi:10.1007/978-0-387-37831-2_14

Claudio Lucchese⁴,
Salvatore Orlando⁴,
Raffaele Perego⁵ &
…
Claudio Silvestri⁴

276 Accesses
1 Citations

Abstract

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) item-sets is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules. We figure out an environment where different datasets are stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge.

This paper shows how frequent closed itemsets, mined independently at each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partitions. To avoid an expensive post-processing phase, needed to compute exact global results, we use a method to approximate the supports of closed itemsets. The approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Proc. of the 1st Workshop on Frequent Itemset Mining Implementations (FIMI’03). 2003.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’ 94, pages 487–499, September 1994.
Google Scholar
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. In Joan Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 255–264. ACM Press, 05 1997.
Google Scholar
E-H. S. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In IEEE Transaction on Knowledge and Data Engineering, 2000.
Google Scholar
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’ 00, pages 1–12, 2000.
Google Scholar
H. Kargupta and P. Chan (Eds.). Advances in Distributed and Parallel Knowledge Discovery. AAAI Press/The MIT Press, 2004.
Google Scholar
J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. In Proc. 2002 Int. Conf. on Knowledge Discovery in Databases (KDD’ 02), Edmonton, Canada, 2002.
Google Scholar
Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 18(1):21–36, January 2006.
Article Google Scholar
A. Mueller. Fast sequential and parallel algorithms for association rules mining: A comparison. Technical Report CS-TR-3515, Univ. of Maryland, 1995.
Google Scholar
S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and resource-aware mining of frequent sets. In Proc. The 2002 IEEE International Conference on Data Mining (ICDM’ 02), pages 338–345, 2002.
Google Scholar
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, and Mohammed Javeed Zaki. Carpenter: finding closed patterns in long biological datasets. In KDD, pages 637–642, 2003.
Google Scholar
B. Park and H. Kargupta. Distributed Data Mining: Algorithms, Systems, and Applications. In Data Mining Handbook, pages 341–358. IEA, 2002.
Google Scholar
Srinivasan Parthasarathy. Efficient progressive sampling for association rules. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), page 354. IEEE Computer Society, 2002.
Google Scholar
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–46, 1999.
Article Google Scholar
J. Pei, J. Han, H. Lu, S. Nishio, and D. Tang, S. amd Yang. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. In Proc. The 2001 IEEE International Conference on Data Mining (ICDM’01), San Jose, CA, USA, 2000.
Google Scholar
Jian Pei, Jiawei Han, and Jianyong Wang. Closet+: Searching for the best strategies for mining frequent closed itemsets. In SIGKDD’ 03, August 2003.
Google Scholar
Ashoka Savasere, Edward Omiecinski, and Shamkant B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 432–444. Morgan Kaufmann, September 1995.
Google Scholar
C. Silvestri and S. Orlando. Distributed Approximate Mining of Frequent Patterns. In Proc. of the 2005 ACM Symposium on Applied Computing, SAC 2005, special track on Data Mining, pages 529–536, 2005.
Google Scholar
C. Silvestri and S. Orlando. Approximate mining of frequent patterns on streams. Intelligent Data Analysis, 2006. To appear.
Google Scholar
Rafik Taouil, Nicolas Pasquier, Yves Bastide, Lotfi Lajhal, and Gerd Stumme. Mining freqent patterns with counting inference. SIGKDD Explorations, 2(2):66–75, December 2000.
Article Google Scholar
Rafik Taouil, Nicolas Pasquier, Yves Bastide, and Lotfi Lakhal. Mining bases for association rules using closed sets. In ICDE, 2000.
Google Scholar
M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.
Article Google Scholar
Mohammed J. Zaki. Mining non-redundant association rules. Data Min. Knowl. Discov., 9(3):223–248, 2004.
Article MathSciNet Google Scholar
Mohammed J. Zaki and Karam Gouda. Fast vertical mining using diffsets. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326–335. ACM Press, 2003.
Google Scholar
Mohammed J. Zaki and Ching-Jui Hsiao. Charm: An efficient algorithm for closed item-sets mining. In 2nd SIAM International Conference on Data Mining, April 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Ca’ Foscari University of Venice, Italy
Claudio Lucchese, Salvatore Orlando & Claudio Silvestri
HPC Laboratory, ISTI-CNR of Pisa, Italy
Raffaele Perego

Authors

Claudio Lucchese
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Silvestri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipto. Elettronica Informatica Sistemistica (DEIS), Università Calabria, via P. Bucci, 41 c, 87036, Rende, Italy
Domenico Talia
ICS-FORTH, P O BOX 1385, 711 10, Heraklion, Greece
Angelos Bilas
Dept. Computer Science, University of Cyprus, 75 Kallipoleos Str., P.O.Box 20537, 1678, Nicosia, Cyprus
Marios D. Dikaiakos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lucchese, C., Orlando, S., Perego, R., Silvestri, C. (2007). Mining Frequent Closed Itemsets from Distributed Repositories. In: Talia, D., Bilas, A., Dikaiakos, M.D. (eds) Knowledge and Data Management in GRIDs. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-37831-2_14

Download citation

DOI: https://doi.org/10.1007/978-0-387-37831-2_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-37830-5
Online ISBN: 978-0-387-37831-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics