research-article

BISC: A bitmap itemset support counting approach for efficient frequent itemset mining

Authors:

Jinlin Chen,

Keli XiaoAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 4, Issue 3

Article No.: 12, Pages 1 - 37

https://doi.org/10.1145/1839490.1839493

Published: 22 October 2010 Publication History

Get Access

Abstract

The performance of a depth-first frequent itemset (FI) miming algorithm is closely related to the total number of recursions. In previous approaches this is mainly decided by the total number of FIs, which results in poor performance when a large number of FIs are involved. To solve this problem, a three-strategy adaptive algorithm, bitmap itemset support counting (BISC), is presented. The core strategy, BISC1, is used in the innermost steps of the recursion. For a database D with only s frequent items, a depth-first approach need up to s levels of recursions to detect all the FIs (up to 2^s). BISC1 completely replaces these recursions with a special summation that directly calculates the supports of all the possible 2^s candidate itemsets. With BISC1 the run-time is entirely independent of the database after one database scan, and the per-candidate cost is only s. To offset the exponential growth of cost (both time and space) with BISC1 as s increases, a second strategy, BISC2, is introduced to effectively double the acceptable range of s. BISC2 divides an itemset into prefix and suffix and improves the performance by pruning all the itemsets with infrequent prefixes. If the total number of frequent items in D is high, the classic database projection strategy is used. In this case for the first s items a single run of BISC (1 or 2) is applied. For each of the remaining items, a projected database is created and the mining process proceeds recursively. To achieve optimal performance, BISC adaptively decides which strategy to use based on the dataset and minimum support. Experiments show that BISC outperforms previous approaches in all the datasets tested. Even though this does not guarantee that BISC will always perform the best, the result is impressive given the fact that most existing algorithms are only efficient in some types of datasets. The memory usage of BISC is also comparable to those of other algorithms.

References

[1]

Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD International Conference on Management of Data. ACM, 207--216.

Abstract

References

Cited By

Index Terms

Recommendations

An improvement for dEclat algorithm

Association rule mining algorithms on high-dimensional datasets

Analysis of sampling techniques for association rule mining

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations