Article

Out-of-core frequent pattern mining on a commodity PC

Authors:

Gregory Buehrer,

Srinivasan Parthasarathy,

Amol GhotingAuthors Info & Claims

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 86 - 95

https://doi.org/10.1145/1150402.1150416

Published: 20 August 2006 Publication History

Abstract

In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.

References

[1]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the International Conference on Management of Data (SIGMOD), 1993.

Digital Library

[2]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1994.

Digital Library

[3]

S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In Proceedings of the International Conference on Management of Data (SIGMOD), 1997.

Digital Library

[4]

D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset mining algorithm for transactional databases. In Proceedings of the International Conference on Data Engineering (ICDE), 2001.

Digital Library

[5]

J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. SIGOPS Oper. Syst. Rev., 35(5):103--116, 2001.

Digital Library

[6]

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1999.

Digital Library

[7]

A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey. Cache-conscious frequent pattern mining on a modern processor. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 577--588, 2005.

Digital Library

[8]

B. Goethals and M. Zaki. Advances in frequent itemset mining implementations. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.

[9]

K. Gouda and M. Zaki. Efficiently mining maximal frequent itemsets. In Proceedings of the International Conference on Data Mining (ICDM), 2001.

Digital Library

[10]

G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations, 2003.

[11]

G. Grahne and J. Zhu. Mining frequent itemsets from secondary memory. In Proceedings of the International Conference on Data Mining (ICDM), 2004.

Digital Library

[12]

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the International Conference on Management of Data (SIGMOD), 2000.

Digital Library

[13]

G. Liu, H. Lu, J. X. Yu, W. Wei, and X. Xiao. Afopt: An efficient implementation of pattern growth approach. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.

[14]

H. Mannila, H. Toivonen, and A. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1997.

Digital Library

[15]

J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In Proceedings of the International Conference on Management of Data (SIGMOD), 1995.

Digital Library

[16]

S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Memory placement techniques for parallel association mining. International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1998.

[17]

S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems Journal, 2001.

Digital Library

[18]

A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1995.

Digital Library

[19]

C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In Proceedings of the International Conference on Very Large Data Bases(VLDB), 1998.

Digital Library

[20]

M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery discovery of association rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1995.

[21]

J. Zhou, J. Cieslewicz, K. Ross, and M. Shah. Improving database performance on simultaneous multhithreading processors. In Proceedings of International Conference on Very Large Data Bases (VLDB), 2005.

Digital Library

Cited By

Wu ZLi CCao JGe Y(2020)On Scalability of Association-rule-based RecommendationACM Transactions on the Web10.1145/339820214:3(1-21)Online publication date: 21-Jun-2020
https://dl.acm.org/doi/10.1145/3398202
Chon KKim M(2017)SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern MinerProceedings of the 7th International Conference on Emerging Databases10.1007/978-981-10-6520-0_11(99-110)Online publication date: 14-Oct-2017
https://doi.org/10.1007/978-981-10-6520-0_11
Shohdy SVishnu AAgrawal G(2016)Fault Tolerant Frequent Pattern Mining2016 IEEE 23rd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2016.012(12-21)Online publication date: Dec-2016
https://doi.org/10.1109/HiPC.2016.012
Show More Cited By

Index Terms

Out-of-core frequent pattern mining on a commodity PC
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Identification of adverse disease agents and risk analysis using frequent pattern mining
Highlights
- An improved algorithm is proposed to construct FP-tree from transactional datasets.
Abstract
Life-threatening illnesses such as cancer, cirrhosis of the liver, and hepatitis have become crucial problems for humanity. The risk of mortality can be deflated by early detection of symptoms and providing the best possible diagnosis. ...
Performance and characteristic analysis of maximal frequent pattern mining methods using additional factors

Various data mining methods have been proposed to handle large-scale data and discover interesting knowledge hidden in the data. Maximal frequent pattern mining is one of the data mining techniques suggested to solve the fatal problem of traditional ...
Closed frequent similar pattern mining

The concept of closed frequent similar pattern mining is introduced.Several lemmas to prune the search space are introduced and proved.A novel closed frequent similar pattern mining algorithm (CFSP-Miner), is proposed.CFSP-Miner is more efficient than ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2006

986 pages

ISBN:1595933395

DOI:10.1145/1150402

Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD06

Sponsor:

KDD06: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 20 - 23, 2006

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
756
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu ZLi CCao JGe Y(2020)On Scalability of Association-rule-based RecommendationACM Transactions on the Web10.1145/339820214:3(1-21)Online publication date: 21-Jun-2020
https://dl.acm.org/doi/10.1145/3398202
Chon KKim M(2017)SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern MinerProceedings of the 7th International Conference on Emerging Databases10.1007/978-981-10-6520-0_11(99-110)Online publication date: 14-Oct-2017
https://doi.org/10.1007/978-981-10-6520-0_11
Shohdy SVishnu AAgrawal G(2016)Fault Tolerant Frequent Pattern Mining2016 IEEE 23rd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2016.012(12-21)Online publication date: Dec-2016
https://doi.org/10.1109/HiPC.2016.012
Wang YChen WWu RLiu Z(2016)FIM Algorithm for Multi-Source Heterogeneous Information in Power Grid Intelligent Dispatching2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)10.1109/CyberC.2016.27(96-99)Online publication date: Oct-2016
https://doi.org/10.1109/CyberC.2016.27
Vishnu AAgarwal K(2015)Large Scale Frequent Pattern Mining Using MPI One-Sided ModelProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.30(138-147)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1109/CLUSTER.2015.30
Braun PCameron JCuzzocrea AJiang FLeung C(2014)Effectively and Efficiently Mining Frequent Patterns from Dense Graph Streams on DiskProcedia Computer Science10.1016/j.procs.2014.08.11435(338-347)Online publication date: 2014
https://doi.org/10.1016/j.procs.2014.08.114
Cuzzocrea AJiang FLee WLeung C(2014)Efficient Frequent Itemset Mining from Dense Data StreamsWeb Technologies and Applications10.1007/978-3-319-11116-2_56(593-601)Online publication date: 2014
https://doi.org/10.1007/978-3-319-11116-2_56
Cameron JCuzzocrea ALeung CShin SMaldonado J(2013)Stream mining of frequent sets with limited memoryProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480398(173-175)Online publication date: 18-Mar-2013
https://dl.acm.org/doi/10.1145/2480362.2480398
Dwivedi V(2013)Disk-resident high utility pattern mining: A trie structure implementation2013 International Conference on Information Systems and Computer Networks10.1109/ICISCON.2013.6524171(44-49)Online publication date: Mar-2013
https://doi.org/10.1109/ICISCON.2013.6524171
Cameron JCuzzocrea AJiang FLeung C(2013)Mining frequent itemsets from sparse data streams in limited memory environmentsProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_5(51-57)Online publication date: 14-Jun-2013
https://dl.acm.org/doi/10.1007/978-3-642-38562-9_5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten