skip to main content
research-article

BISC: A bitmap itemset support counting approach for efficient frequent itemset mining

Published: 22 October 2010 Publication History

Abstract

The performance of a depth-first frequent itemset (FI) miming algorithm is closely related to the total number of recursions. In previous approaches this is mainly decided by the total number of FIs, which results in poor performance when a large number of FIs are involved. To solve this problem, a three-strategy adaptive algorithm, bitmap itemset support counting (BISC), is presented. The core strategy, BISC1, is used in the innermost steps of the recursion. For a database D with only s frequent items, a depth-first approach need up to s levels of recursions to detect all the FIs (up to 2s). BISC1 completely replaces these recursions with a special summation that directly calculates the supports of all the possible 2s candidate itemsets. With BISC1 the run-time is entirely independent of the database after one database scan, and the per-candidate cost is only s. To offset the exponential growth of cost (both time and space) with BISC1 as s increases, a second strategy, BISC2, is introduced to effectively double the acceptable range of s. BISC2 divides an itemset into prefix and suffix and improves the performance by pruning all the itemsets with infrequent prefixes. If the total number of frequent items in D is high, the classic database projection strategy is used. In this case for the first s items a single run of BISC (1 or 2) is applied. For each of the remaining items, a projected database is created and the mining process proceeds recursively. To achieve optimal performance, BISC adaptively decides which strategy to use based on the dataset and minimum support. Experiments show that BISC outperforms previous approaches in all the datasets tested. Even though this does not guarantee that BISC will always perform the best, the result is impressive given the fact that most existing algorithms are only efficient in some types of datasets. The memory usage of BISC is also comparable to those of other algorithms.

References

[1]
Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD International Conference on Management of Data. ACM, 207--216.
[2]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of 20th International Conference on Very Large Data Bases. Morgan Kaufmann, 487--499.
[3]
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings ICDE'95, 3--14.
[4]
Bodon, F. 2003. A fast apriori implementation. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[5]
Bodon, F. 2006. A Survey on Frequent Itemset Mining, Tech. rep., Budapest University of Technology and Economic http://www.cs.bme.hu/~bodon/kozos/papers/fim-survey.pdf.
[6]
Borgelt, C. 2004. Efficient implementations of apriori and Eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[7]
Brin S., Motwani R., Ullman, J. D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. SIGMOD Record 26, 2, 255--276.
[8]
Ceglar, A. and Roddick, J. F. 2006. Association mining. ACM Comput. Surv. 38, 2.
[9]
Grahne, G. and Zhu, J. 2003. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations.
[10]
Grahne, G. and Zhu, J. 2005. Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans. Knowl. Data Engin. 17, 10, 1347--1362.
[11]
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of ACM SIGMOD International Conference on Management of Data. ACM, 1--12.
[12]
Liu, G., Lu, H., Yu, J. X., Wang, W., and Xiao, X. 2003. AFOPT: An efficient implementation of pattern growth approach. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations.
[13]
Orlando, S., Palmerini, P., and Perego, R. 2001. Enhancing the Apriori algorithm for frequent set counting. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. 71--82.
[14]
Orlando, S., Palmerini, P., Perego, R., and Silvestri, F. 2002. Adaptive and resource-aware mining of frequent sets. In Proceedings of the IEEE International Conference on Data Mining, 338--345.
[15]
Orlando, S., Lucchese, C., Palmerini, P., Perego, R., and Silvestri, F. 2003. kDCI: A multi-strategy algorithm for mining frequent sets. In Proceedings of the IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop.
[16]
Ozel, S. A., and Guvenir, H. A. 2001. An algorithm for mining association rules using perfect hashing and database pruning. In Proceedings of the 10th Turkish Symposium on Artificial Intelligence and Neural Networks, 257--264.
[17]
Park, J. S., Chen, M. S., and Yu, P. S. 1997. Using a hash-based method with transaction trimming and database scan reduction for mining association rules. IEEE Trans. Knowledge Data Eng. 9, 5, 813--825.
[18]
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., and Yang, D. 2001. Hmine: yper-structure mining of frequent patterns in large databases. In Proceedings of IEEE International Conference on Data Mining, 441--448.
[19]
Pietracaprina, A. and Zandolin, D. 2003. Mining frequent itemsets using Patricia Tries, 2003. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations.
[20]
Racz, B. 2004. Nonordfp: An FP-growth variation without rebuilding the FP-tree. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[21]
Racz, B., Bodon, F., and Schmidt-Thieme, L. 2005. Benchmarking frequent itemset mining algorithms: From measurement to analysis. In Proceedings of ACM SIGKDD Workshop on Open Source Data Mining Workshop (OSDM'05), 36--45.
[22]
Schmidt-Thieme, L. 2004. Algorithmic features of eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[23]
Song, M. and Rajasekaran, S. 2006. A transaction mapping algorithm for frequent itemsets mining. IEEE Trans. Knowl. Data Engin. 18, 4, 472--481.
[24]
Uno, T., Asai, T., Uchida, Y., and Arimura, H. 2003. LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations.
[25]
Uno, T., Asai, T., Uchida, Y., and Arimura, H. 2004. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. InProceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.
[26]
Uno, T., Kiyomi, M., Arimura, H. 2005. LCM ver. 3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the ACM SIGKDD Open Source Data Mining Workshop on Frequent Pattern Mining Implementations, 77--86.
[27]
Wang, K., Tang, L., Han, J., and Liu, J. 2002. Top down FP-growth for association rule mining. In Proceedings of the 6th Pacific Asia Conference on Knowledge Discovery and Data Mining, 334--340.
[28]
Zaki, M. J., Parthasarathy, S., Ogihara, M., and Li, W. 1997. New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 283--286.
[29]
Zaki, M. J., and Gouda, K. 2003. Fast vertical mining using diffsets. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 326--335.

Cited By

View all
  • (2023)Mining High Utility Itemsets Using Prefix Trees and Utility VectorsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325612635:10(10224-10236)Online publication date: 1-Oct-2023
  • (2021)A Bitmap Approach for Mining Erasable ItemsetsIEEE Access10.1109/ACCESS.2021.31005849(106029-106038)Online publication date: 2021
  • (2017)Mining Representative Patterns Under Differential PrivacyWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_23(295-302)Online publication date: 4-Oct-2017
  • Show More Cited By

Index Terms

  1. BISC: A bitmap itemset support counting approach for efficient frequent itemset mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 3
    October 2010
    191 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/1839490
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2010
    Accepted: 01 December 2009
    Revised: 01 March 2009
    Received: 01 October 2008
    Published in TKDD Volume 4, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data mining algorithms
    2. association rule mining
    3. frequent itemset mining

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Mining High Utility Itemsets Using Prefix Trees and Utility VectorsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325612635:10(10224-10236)Online publication date: 1-Oct-2023
    • (2021)A Bitmap Approach for Mining Erasable ItemsetsIEEE Access10.1109/ACCESS.2021.31005849(106029-106038)Online publication date: 2021
    • (2017)Mining Representative Patterns Under Differential PrivacyWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_23(295-302)Online publication date: 4-Oct-2017
    • (2017)A survey of itemset miningWIREs Data Mining and Knowledge Discovery10.1002/widm.12077:4Online publication date: 21-Apr-2017
    • (2016)Binary partition for itemsets expansion in mining high utility itemsetsIntelligent Data Analysis10.3233/IDA-16083820:4(915-931)Online publication date: 15-Jun-2016
    • (2016)A review of differential privacy in individual data releaseInternational Journal of Distributed Sensor Networks10.1155/2015/2596822015(1-1)Online publication date: 1-Jan-2016
    • (2016)A high utility itemset mining algorithm based on subsume indexKnowledge and Information Systems10.1007/s10115-015-0900-149:1(315-340)Online publication date: 1-Oct-2016
    • (2015)A new enriched exploration of modified algorithm for generating single dimensional fuzzy itemsets2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO)10.1109/ISCO.2015.7282368(1-6)Online publication date: Jan-2015
    • (2015)An Efficient Count Based Transaction Reduction Approach for Mining Frequent PatternsProcedia Computer Science10.1016/j.procs.2015.03.18347(52-61)Online publication date: 2015
    • (2014)Top-k frequent itemsets via differentially private FP-treesProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623723(931-940)Online publication date: 24-Aug-2014
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media