skip to main content
10.1145/1150402.1150416acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Out-of-core frequent pattern mining on a commodity PC

Published: 20 August 2006 Publication History

Abstract

In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the International Conference on Management of Data (SIGMOD), 1993.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1994.
[3]
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In Proceedings of the International Conference on Management of Data (SIGMOD), 1997.
[4]
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset mining algorithm for transactional databases. In Proceedings of the International Conference on Data Engineering (ICDE), 2001.
[5]
J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. SIGOPS Oper. Syst. Rev., 35(5):103--116, 2001.
[6]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1999.
[7]
A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey. Cache-conscious frequent pattern mining on a modern processor. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 577--588, 2005.
[8]
B. Goethals and M. Zaki. Advances in frequent itemset mining implementations. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.
[9]
K. Gouda and M. Zaki. Efficiently mining maximal frequent itemsets. In Proceedings of the International Conference on Data Mining (ICDM), 2001.
[10]
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
[11]
G. Grahne and J. Zhu. Mining frequent itemsets from secondary memory. In Proceedings of the International Conference on Data Mining (ICDM), 2004.
[12]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the International Conference on Management of Data (SIGMOD), 2000.
[13]
G. Liu, H. Lu, J. X. Yu, W. Wei, and X. Xiao. Afopt: An efficient implementation of pattern growth approach. In Proceedings of the ICDM workshop on frequent itemset mining implementations, 2003.
[14]
H. Mannila, H. Toivonen, and A. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1997.
[15]
J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In Proceedings of the International Conference on Management of Data (SIGMOD), 1995.
[16]
S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Memory placement techniques for parallel association mining. International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1998.
[17]
S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems Journal, 2001.
[18]
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 1995.
[19]
C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In Proceedings of the International Conference on Very Large Data Bases(VLDB), 1998.
[20]
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery discovery of association rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1995.
[21]
J. Zhou, J. Cieslewicz, K. Ross, and M. Shah. Improving database performance on simultaneous multhithreading processors. In Proceedings of International Conference on Very Large Data Bases (VLDB), 2005.

Cited By

View all
  • (2020)On Scalability of Association-rule-based RecommendationACM Transactions on the Web10.1145/339820214:3(1-21)Online publication date: 21-Jun-2020
  • (2017)SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern MinerProceedings of the 7th International Conference on Emerging Databases10.1007/978-981-10-6520-0_11(99-110)Online publication date: 14-Oct-2017
  • (2016)Fault Tolerant Frequent Pattern Mining2016 IEEE 23rd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2016.012(12-21)Online publication date: Dec-2016
  • Show More Cited By

Index Terms

  1. Out-of-core frequent pattern mining on a commodity PC

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. itemsets
    3. out of core
    4. pattern mining
    5. secondary memory

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)On Scalability of Association-rule-based RecommendationACM Transactions on the Web10.1145/339820214:3(1-21)Online publication date: 21-Jun-2020
    • (2017)SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern MinerProceedings of the 7th International Conference on Emerging Databases10.1007/978-981-10-6520-0_11(99-110)Online publication date: 14-Oct-2017
    • (2016)Fault Tolerant Frequent Pattern Mining2016 IEEE 23rd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2016.012(12-21)Online publication date: Dec-2016
    • (2016)FIM Algorithm for Multi-Source Heterogeneous Information in Power Grid Intelligent Dispatching2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)10.1109/CyberC.2016.27(96-99)Online publication date: Oct-2016
    • (2015)Large Scale Frequent Pattern Mining Using MPI One-Sided ModelProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.30(138-147)Online publication date: 8-Sep-2015
    • (2014)Effectively and Efficiently Mining Frequent Patterns from Dense Graph Streams on DiskProcedia Computer Science10.1016/j.procs.2014.08.11435(338-347)Online publication date: 2014
    • (2014)Efficient Frequent Itemset Mining from Dense Data StreamsWeb Technologies and Applications10.1007/978-3-319-11116-2_56(593-601)Online publication date: 2014
    • (2013)Stream mining of frequent sets with limited memoryProceedings of the 28th Annual ACM Symposium on Applied Computing10.1145/2480362.2480398(173-175)Online publication date: 18-Mar-2013
    • (2013)Disk-resident high utility pattern mining: A trie structure implementation2013 International Conference on Information Systems and Computer Networks10.1109/ICISCON.2013.6524171(44-49)Online publication date: Mar-2013
    • (2013)Mining frequent itemsets from sparse data streams in limited memory environmentsProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_5(51-57)Online publication date: 14-Jun-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media