Abstract
In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) item-sets is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules. We figure out an environment where different datasets are stored in different sites. We assume that, due to the huge size of datasets and privacy concerns, dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining at each site, and then merge local results for deriving global knowledge.
This paper shows how frequent closed itemsets, mined independently at each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partitions. To avoid an expensive post-processing phase, needed to compute exact global results, we use a method to approximate the supports of closed itemsets. The approximation is only needed for those globally (closed) frequent itemsets which are locally infrequent on some dataset partitions, and thus are not returned at all from the corresponding sites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Proc. of the 1st Workshop on Frequent Itemset Mining Implementations (FIMI’03). 2003.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’ 94, pages 487–499, September 1994.
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. In Joan Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 255–264. ACM Press, 05 1997.
E-H. S. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In IEEE Transaction on Knowledge and Data Engineering, 2000.
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’ 00, pages 1–12, 2000.
H. Kargupta and P. Chan (Eds.). Advances in Distributed and Parallel Knowledge Discovery. AAAI Press/The MIT Press, 2004.
J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. In Proc. 2002 Int. Conf. on Knowledge Discovery in Databases (KDD’ 02), Edmonton, Canada, 2002.
Claudio Lucchese, Salvatore Orlando, and Raffaele Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 18(1):21–36, January 2006.
A. Mueller. Fast sequential and parallel algorithms for association rules mining: A comparison. Technical Report CS-TR-3515, Univ. of Maryland, 1995.
S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and resource-aware mining of frequent sets. In Proc. The 2002 IEEE International Conference on Data Mining (ICDM’ 02), pages 338–345, 2002.
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, and Mohammed Javeed Zaki. Carpenter: finding closed patterns in long biological datasets. In KDD, pages 637–642, 2003.
B. Park and H. Kargupta. Distributed Data Mining: Algorithms, Systems, and Applications. In Data Mining Handbook, pages 341–358. IEA, 2002.
Srinivasan Parthasarathy. Efficient progressive sampling for association rules. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), page 354. IEEE Computer Society, 2002.
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–46, 1999.
J. Pei, J. Han, H. Lu, S. Nishio, and D. Tang, S. amd Yang. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. In Proc. The 2001 IEEE International Conference on Data Mining (ICDM’01), San Jose, CA, USA, 2000.
Jian Pei, Jiawei Han, and Jianyong Wang. Closet+: Searching for the best strategies for mining frequent closed itemsets. In SIGKDD’ 03, August 2003.
Ashoka Savasere, Edward Omiecinski, and Shamkant B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, pages 432–444. Morgan Kaufmann, September 1995.
C. Silvestri and S. Orlando. Distributed Approximate Mining of Frequent Patterns. In Proc. of the 2005 ACM Symposium on Applied Computing, SAC 2005, special track on Data Mining, pages 529–536, 2005.
C. Silvestri and S. Orlando. Approximate mining of frequent patterns on streams. Intelligent Data Analysis, 2006. To appear.
Rafik Taouil, Nicolas Pasquier, Yves Bastide, Lotfi Lajhal, and Gerd Stumme. Mining freqent patterns with counting inference. SIGKDD Explorations, 2(2):66–75, December 2000.
Rafik Taouil, Nicolas Pasquier, Yves Bastide, and Lotfi Lakhal. Mining bases for association rules using closed sets. In ICDE, 2000.
M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.
Mohammed J. Zaki. Mining non-redundant association rules. Data Min. Knowl. Discov., 9(3):223–248, 2004.
Mohammed J. Zaki and Karam Gouda. Fast vertical mining using diffsets. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326–335. ACM Press, 2003.
Mohammed J. Zaki and Ching-Jui Hsiao. Charm: An efficient algorithm for closed item-sets mining. In 2nd SIAM International Conference on Data Mining, April 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this paper
Cite this paper
Lucchese, C., Orlando, S., Perego, R., Silvestri, C. (2007). Mining Frequent Closed Itemsets from Distributed Repositories. In: Talia, D., Bilas, A., Dikaiakos, M.D. (eds) Knowledge and Data Management in GRIDs. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-37831-2_14
Download citation
DOI: https://doi.org/10.1007/978-0-387-37831-2_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-37830-5
Online ISBN: 978-0-387-37831-2
eBook Packages: Computer ScienceComputer Science (R0)