Frequent Itemset Mining

Cafaro, Massimo; Pulimeno, Marco

doi:10.1007/978-3-030-06222-4_6

Massimo Cafaro³ &
Marco Pulimeno³

2190 Accesses
3 Citations

Abstract

We present a survey of the most important algorithms that have been proposed in the context of the frequent itemset mining. We start with an introduction and overview of basic sequential algorithms, and then discuss and compare different parallel approaches based on shared-memory, message-passing, map-reduce, and the use of GPU accelerators. Even though our survey certainly is not exhaustive, it covers essential reference material, since we believe that an attempt to cover everything will instead fail to convey any useful information to the interested readers. Our hope is that this work will help interested researchers and practitioners, in particular those coming from a business-oriented background, quickly enabling them to develop their understanding of an area likely to play an ever more significant role in coming years.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/.
Charu C Aggarwal. Data mining: The textbook. Springer, 2015.
Google Scholar
Charu C Aggarwal and Jiawei Han. Frequent pattern mining. Springer, 2014.
Google Scholar
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22(2):207–216, 1993.
Article Google Scholar
Rakesh Agrawal and John C. Shafer. Parallel mining of association rules. IEEE Trans. on Knowl. and Data Eng., 8(6):962–969, December 1996.
Article Google Scholar
Khedija Arour and Amani Belkahla. Frequent pattern-growth algorithm on multi-core CPU and GPU processors. CIT. Journal of Computing and Information Technology, 22(3):159–169, 2014.
Article Google Scholar
Gowtham Atluri, Rohit Gupta, Gang Fang, Gaurav Pandey, Michael Steinbach, and Vipin Kumar. Association analysis techniques for bioinformatics problems. In Bioinformatics and Computational Biology, pages 1–13. Springer, 2009.
Google Scholar
Roberto J Bayardo Jr. Efficiently mining long patterns from databases. ACM Sigmod Record, 27(2):85–93, 1998.
Article Google Scholar
Florian Beil, Martin Ester, and Xiaowei Xu. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 436–442. ACM, 2002.
Google Scholar
Daniel Boley, Maria Gini, Robert Gross, Eui-Hong Sam Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore. Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3):329–341, 1999.
Article Google Scholar
Doug Burdick, Manuel Calimlim, and Johannes Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In Data Engineering, 2001. Proceedings. 17th International Conference on, pages 443–452. IEEE, 2001.
Google Scholar
Dehao Chen, Chunrong Lai, Wei Hu, Wenguang Chen, Yimin Zhang, and Weimin Zheng. Tree partition based parallel frequent pattern mining on shared memory systems. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, 2006.
Google Scholar
Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. A sampling-based framework for parallel data mining. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’05, pages 255–265. ACM, 2005.
Google Scholar
Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 43–52. ACM, 1999.
Google Scholar
Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, and Qiong Luo. Frequent itemset mining on graphics processors. In Proceedings of the fifth international workshop on data management on new hardware, pages 34–42. ACM, 2009.
Google Scholar
Benjamin CM Fung, Ke Wang, and Martin Ester. Hierarchical document clustering using frequent itemsets. In SDM, volume 3, pages 59–70. SIAM, 2003.
Google Scholar
Bart Goethals and Mohammed J. Zaki, editors. Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. http://www.ceur-ws.org/Vol-90/.
Karam Gouda and Mohammed Zaki. Efficiently mining maximal frequent itemsets. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 163–170. IEEE, 2001.
Google Scholar
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 29(2):1–12, 2000.
Article Google Scholar
Renáta Iváncsy and István Vajk. Frequent pattern mining in web log data. Acta Polytechnica Hungarica, 3(1):77–90, 2006.
Google Scholar
Ruoming Jin, Ge Yang, and G. Agrawal. Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. Knowledge and Data Engineering, IEEE Transactions on, 17(1):71–89, 2005.
Google Scholar
Wenke Lee, Salvatore J Stolfo, and Kui W Mok. Mining audit data to build intrusion detection models. In KDD, pages 66–72, 1998.
Google Scholar
Kingsly Leung and Christopher Leckie. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38, pages 333–342. Australian Computer Society, Inc., 2005.
Google Scholar
Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC ’12, page 1, New York, New York, USA, Feb 2012. ACM Press.
Google Scholar
Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data mining and knowledge discovery, 6(1):83–105, 2002.
Article MathSciNet Google Scholar
Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases, pages 1275–1285. VLDB Endowment, 2007.
Google Scholar
Elsa Loekito and James Bailey. Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 307–316. ACM, 2006.
Google Scholar
Stéphane Lopes, Jean-Marc Petit, and Lotfi Lakhal. Efficient discovery of functional dependencies and Armstrong relations. In EDBT, volume 1777, pages 350–364. Springer, 2000.
Google Scholar
Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proceedings of the 3rd international workshop on Web information and data management, pages 9–15. ACM, 2001.
Google Scholar
Bamshad Mobasher, Namit Jain, Eui-Hong Han, and Jaideep Srivastava. Web mining: Pattern discovery from world wide web transactions. Technical report, Technical Report TR96-050, Department of Computer Science, University of Minnesota, 1996.
Google Scholar
Sandy Moens, Emin Aksehirli, and Bart Goethals. Frequent Itemset Mining for Big Data. In 2013 IEEE International Conference on Big Data, pages 111–118. IEEE, Oct 2013.
Google Scholar
Andreas Mueller. Fast sequential and parallel algorithms for association rule mining: A comparison. Technical report, 1995.
Google Scholar
B. Negrevergne, A. Termier, J. Mehaut, and T. Uno. Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses. In High Performance Computing and Simulation (HPCS), 2010 International Conference on, pages 521–528, 2010.
Google Scholar
Srinivasan Parthasarathy, Mohammed Javeed Zaki, Mitsunori Ogihara, and Wei Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems, 3(1):1–29, 2001.
Article Google Scholar
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Discovering frequent closed itemsets for association rules. Database Theory – ICDT’99, pages 398–416, 1999.
Google Scholar
Jian Pei, Jiawei Han, and Runying Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, volume 4, pages 21–30, 2000.
Google Scholar
Victor Podlozhnyuk. Histogram calculation on CUDA. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/histogram64/doc/histogram.pdf.
Anand Rajaraman, Jeffrey D Ullman, Jeffrey David Ullman, and Jeffrey David Ullman. Mining of massive datasets, volume 1. Cambridge University Press Cambridge, 2012.
Google Scholar
Bart Goethals Roberto Bayardo and Mohammed J. Zaki, editors. Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. http://www.ceur-ws.org/Vol-126/.
Ken Satoh and Takeaki Uno. Enumerating maximal frequent sets using irredundant dualization. In Discovery Science, pages 256–268. Springer, 2003.
Google Scholar
C. Silvestri and S. Orlando. gpuDCI: Exploiting GPUs in frequent itemset mining. In Parallel, Distributed and Network-Based Processing (PDP), 2012 20th Euromicro International Conference on, pages 416–425, Feb 2012.
Google Scholar
Gerd Stumme, Rafik Taouil, Yves Bastide, Nicolas Pasquier, and Lotfi Lakhal. Computing iceberg concept lattices with titanic. Data & knowledge engineering, 42(2):189–222, 2002.
Article Google Scholar
G. Teodoro, N. Mariano, W. Meira, and R. Ferreira. Tree projection-based frequent itemset mining on multicore CPUs and GPUs. In Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, pages 47–54, 2010.
Google Scholar
Pawel Terlecki and Krzysztof Walczak. Jumping emerging patterns with negation in transaction databases–classification and discovery. Information Sciences, 177(24):5675–5690, 2007.
Article MathSciNet Google Scholar
Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In Workshop on Frequent Itemset Mining Implementations, 2004.
Google Scholar
Renato Vimieiro and Pablo Moscato. Mining disjunctive minimal generators with TitanicOR. Expert Systems with Applications, 39(9):8228–8238, 2012.
Article Google Scholar
Renato Vimieiro and Pablo Moscato. Disclosed: An efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Information Sciences, 280:171–187, 2014.
Article MathSciNet Google Scholar
Renato Vimieiro and Pablo Moscato. A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Information Systems, 40:1–10, 2014.
Article Google Scholar
Mohammed J Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering, IEEE Transactions on, 12(3):372–390, 2000.
Google Scholar
Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov., 1(4):343–373, December 1997.
Article Google Scholar
Mohammed J. Zaki and Jr. Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, May 2014.
Google Scholar
Mohammed Javeed Zaki and Ching-Jiu Hsiao. Charm: An efficient algorithm for closed itemset mining. In SDM, volume 2, pages 457–473, 2002.
Google Scholar
Fan Zhang, Yan Zhang, and J. Bakos. GPApriori: GPU-accelerated frequent itemset mining. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 590–594, Sept 2011.
Google Scholar
Yan Zhang, Fan Zhang, and Jason Bakos. Frequent itemset mining on large-scale shared memory machines. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 585–589. IEEE, 2011.
Google Scholar
Lizhuang Zhao, Mohammed J Zaki, and Naren Ramakrishnan. Blosom: a framework for mining arbitrary boolean expressions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 827–832. ACM, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Salento, Lecce, Italy
Massimo Cafaro & Marco Pulimeno

Authors

Massimo Cafaro
View author publications
You can also search for this author in PubMed Google Scholar
Marco Pulimeno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimo Cafaro .

Editor information

Editors and Affiliations

School of Electrical Engineering and Computing, The University of Newcastle, Callaghan, NSW, Australia
Pablo Moscato
School of Electrical Engineering and Computing, The University of Newcastle, Callaghan, NSW, Australia
Natalie Jane de Vries

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cafaro, M., Pulimeno, M. (2019). Frequent Itemset Mining. In: Moscato, P., de Vries, N. (eds) Business and Consumer Analytics: New Ideas. Springer, Cham. https://doi.org/10.1007/978-3-030-06222-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-06222-4_6
Published: 31 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06221-7
Online ISBN: 978-3-030-06222-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics