HashEclat: an efficient frequent itemset algorithm

Zhang, Chunkai; Tian, Panbo; Zhang, Xudong; Liao, Qing; Jiang, Zoe L.; Wang, Xuan

doi:10.1007/s13042-018-00918-x

HashEclat: an efficient frequent itemset algorithm

Original Article
Published: 04 January 2019

Volume 10, pages 3003–3016, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Chunkai Zhang¹,
Panbo Tian¹,
Xudong Zhang¹,
Qing Liao¹,
Zoe L. Jiang¹ &
…
Xuan Wang¹

536 Accesses
20 Citations
Explore all metrics

Abstract

The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than ‘exact’ result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

Article 10 April 2024

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Article 10 October 2020

References

Han J, Kamber M (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algorithms Second Ed 5(4): 1–18
MATH Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases, VLDB vol, pp 487–499
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, pp 1–12
Article Google Scholar
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Article Google Scholar
Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-growth frequent itemset mining algorithms. In: Southeast con, pp 1–7
Preiss PM, Ma R, Tai ME, Lukin A, Rispoli M, Zupancic P, Greiner M (2015) Strongly correlated quantum walks in optical lattices. Science 347(6227):1229–1233
Article MathSciNet Google Scholar
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335
Ma Z, Yang J, Zhang T, Liu F (2016) An improved eclat algorithm for mining association rules based on increased search strategy. Int J Database Theory Appl 9(5):251–266
Article Google Scholar
Xiong ZY, Chen PE, Zhang YF (2010) Improvement of eclat algorithm for association rules based on hash boolean matrix. Appl Res Comput 27(4):1323–1325
Google Scholar
Cohen H, Porat E (2010) Fast set intersection and two-patterns matching. Theor Comput Sci 411(40–42):3795–3800
Article MathSciNet Google Scholar
Wang X, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Article MathSciNet Google Scholar
Wang R, Wang X, Kwong S et al (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEEE Trans Fuzzy Syst 25(6):1460–1475
Article Google Scholar
Cohen E, Kaplan H (2013) What you can do with coordinated samples. Lect Notes Comput Sci 8096:452–467
Article MathSciNet Google Scholar
Cohen E, Kaplan H, Sen S (2009) Coordinated weighted sampling for estimating aggregates over multiple weight assignments. Proc VLDB Endow 2(1):646–657
Article Google Scholar
Teschner M, Heidelberger B, Müller M, Pomerantes D, Gross MH (2003) Optimized spatial hashing for collision detection of deformable objects. In: Vmv, pp 47–54
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (2000) Min-wise independent permutations. J Comput Syst Sci 60(3):630–659
Article MathSciNet Google Scholar
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R et al (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78
Article Google Scholar
Pagh R, Stöckel M, Woodruff DP (2014) Is min-wise hashing optimal for summarizing set intersection?. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 109–120
Goethals B (2002) Survey on frequent pattern mining. Univ Helsinki 63(14):47–52
Google Scholar
Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 19–23
MATH Google Scholar
Wang X, Zhang T, Wang R (2017) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. In: IEEE transactions on systems, man, and cybernetics: systems, IEEE early access articles, p 99
Xun Y, Zhang J, Qin X, Zhao X (2017) Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. In: IEEE transactions on parallel and distributed systems, vol 99, pp 77–84
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations (extended abstract). In: Stoc’98 Proceedings of the Thirtieth annual acm symposium on theory of computing, vol 60, no 3, pp 327–336
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences, pp 21–29
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge, pp 7–15
Book Google Scholar
Wang X, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Article Google Scholar
Szmit R (2013) Locality sensitive hashing for similarity search using MapReduce on large scale data. In: Language processing and intelligent information systems, pp 171–178
Chapter Google Scholar
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp 812–815
Wang H, Cao J, Shu L, Rafiei D (2013) Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis. In Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1969–1978
Li P, Owen A, Zhang CH (2012) One permutation hashing for efficient search and learning. arXiv preprint arXiv, pp 1208–1259
Li P, Shrivastava A, Moore J, König AC (2011) B-bit minwise hashing for large-scale learning. Comput Sci 54(8):101–109
Google Scholar
Wang X, Wang R, Feng H-M, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620–635
Article MathSciNet Google Scholar
Li P, Gui W (2010) b-Bit minwise hashing for estimating three-way similarities. In: International conference on neural information processing systems, pp 1387–1395
Frequent Itemset Mining Dataset Repository, Available at: http://fimi.ua.ac.be/data. Accessed 18 Dec 2018
Huang H, Xu H, Wang X, Silamu W (2015) Maximum f1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Trans Audio Speech Lang Process 23(4):787–797
Article Google Scholar
Wang X, He Y-L, Dabby D (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39
Article Google Scholar

Download references

Acknowledgements

This study was supported by the Foundation Item: Shenzhen Research Council (no. JSGG20170822160842949, JCYJ20170307151518535).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Chunkai Zhang, Panbo Tian, Xudong Zhang, Qing Liao, Zoe L. Jiang & Xuan Wang

Authors

Chunkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Panbo Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Liao
View author publications
You can also search for this author in PubMed Google Scholar
Zoe L. Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunkai Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Tian, P., Zhang, X. et al. HashEclat: an efficient frequent itemset algorithm. Int. J. Mach. Learn. & Cyber. 10, 3003–3016 (2019). https://doi.org/10.1007/s13042-018-00918-x

Download citation

Received: 01 November 2017
Accepted: 26 December 2018
Published: 04 January 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s13042-018-00918-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HashEclat: an efficient frequent itemset algorithm

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HashEclat: an efficient frequent itemset algorithm

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation