Skip to main content

Frequent Itemset Mining

  • Chapter
  • First Online:
Book cover Business and Consumer Analytics: New Ideas

Abstract

We present a survey of the most important algorithms that have been proposed in the context of the frequent itemset mining. We start with an introduction and overview of basic sequential algorithms, and then discuss and compare different parallel approaches based on shared-memory, message-passing, map-reduce, and the use of GPU accelerators. Even though our survey certainly is not exhaustive, it covers essential reference material, since we believe that an attempt to cover everything will instead fail to convey any useful information to the interested readers. Our hope is that this work will help interested researchers and practitioners, in particular those coming from a business-oriented background, quickly enabling them to develop their understanding of an area likely to play an ever more significant role in coming years.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/.

  2. Charu C Aggarwal. Data mining: The textbook. Springer, 2015.

    Google Scholar 

  3. Charu C Aggarwal and Jiawei Han. Frequent pattern mining. Springer, 2014.

    Google Scholar 

  4. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22(2):207–216, 1993.

    Article  Google Scholar 

  5. Rakesh Agrawal and John C. Shafer. Parallel mining of association rules. IEEE Trans. on Knowl. and Data Eng., 8(6):962–969, December 1996.

    Article  Google Scholar 

  6. Khedija Arour and Amani Belkahla. Frequent pattern-growth algorithm on multi-core CPU and GPU processors. CIT. Journal of Computing and Information Technology, 22(3):159–169, 2014.

    Article  Google Scholar 

  7. Gowtham Atluri, Rohit Gupta, Gang Fang, Gaurav Pandey, Michael Steinbach, and Vipin Kumar. Association analysis techniques for bioinformatics problems. In Bioinformatics and Computational Biology, pages 1–13. Springer, 2009.

    Google Scholar 

  8. Roberto J Bayardo Jr. Efficiently mining long patterns from databases. ACM Sigmod Record, 27(2):85–93, 1998.

    Article  Google Scholar 

  9. Florian Beil, Martin Ester, and Xiaowei Xu. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 436–442. ACM, 2002.

    Google Scholar 

  10. Daniel Boley, Maria Gini, Robert Gross, Eui-Hong Sam Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore. Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3):329–341, 1999.

    Article  Google Scholar 

  11. Doug Burdick, Manuel Calimlim, and Johannes Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In Data Engineering, 2001. Proceedings. 17th International Conference on, pages 443–452. IEEE, 2001.

    Google Scholar 

  12. Dehao Chen, Chunrong Lai, Wei Hu, Wenguang Chen, Yimin Zhang, and Weimin Zheng. Tree partition based parallel frequent pattern mining on shared memory systems. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, 2006.

    Google Scholar 

  13. Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. A sampling-based framework for parallel data mining. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’05, pages 255–265. ACM, 2005.

    Google Scholar 

  14. Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 43–52. ACM, 1999.

    Google Scholar 

  15. Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, and Qiong Luo. Frequent itemset mining on graphics processors. In Proceedings of the fifth international workshop on data management on new hardware, pages 34–42. ACM, 2009.

    Google Scholar 

  16. Benjamin CM Fung, Ke Wang, and Martin Ester. Hierarchical document clustering using frequent itemsets. In SDM, volume 3, pages 59–70. SIAM, 2003.

    Google Scholar 

  17. Bart Goethals and Mohammed J. Zaki, editors. Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. http://www.ceur-ws.org/Vol-90/.

  18. Karam Gouda and Mohammed Zaki. Efficiently mining maximal frequent itemsets. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 163–170. IEEE, 2001.

    Google Scholar 

  19. Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 29(2):1–12, 2000.

    Article  Google Scholar 

  20. Renáta Iváncsy and István Vajk. Frequent pattern mining in web log data. Acta Polytechnica Hungarica, 3(1):77–90, 2006.

    Google Scholar 

  21. Ruoming Jin, Ge Yang, and G. Agrawal. Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. Knowledge and Data Engineering, IEEE Transactions on, 17(1):71–89, 2005.

    Google Scholar 

  22. Wenke Lee, Salvatore J Stolfo, and Kui W Mok. Mining audit data to build intrusion detection models. In KDD, pages 66–72, 1998.

    Google Scholar 

  23. Kingsly Leung and Christopher Leckie. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38, pages 333–342. Australian Computer Society, Inc., 2005.

    Google Scholar 

  24. Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC ’12, page 1, New York, New York, USA, Feb 2012. ACM Press.

    Google Scholar 

  25. Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data mining and knowledge discovery, 6(1):83–105, 2002.

    Article  MathSciNet  Google Scholar 

  26. Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases, pages 1275–1285. VLDB Endowment, 2007.

    Google Scholar 

  27. Elsa Loekito and James Bailey. Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 307–316. ACM, 2006.

    Google Scholar 

  28. Stéphane Lopes, Jean-Marc Petit, and Lotfi Lakhal. Efficient discovery of functional dependencies and Armstrong relations. In EDBT, volume 1777, pages 350–364. Springer, 2000.

    Google Scholar 

  29. Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proceedings of the 3rd international workshop on Web information and data management, pages 9–15. ACM, 2001.

    Google Scholar 

  30. Bamshad Mobasher, Namit Jain, Eui-Hong Han, and Jaideep Srivastava. Web mining: Pattern discovery from world wide web transactions. Technical report, Technical Report TR96-050, Department of Computer Science, University of Minnesota, 1996.

    Google Scholar 

  31. Sandy Moens, Emin Aksehirli, and Bart Goethals. Frequent Itemset Mining for Big Data. In 2013 IEEE International Conference on Big Data, pages 111–118. IEEE, Oct 2013.

    Google Scholar 

  32. Andreas Mueller. Fast sequential and parallel algorithms for association rule mining: A comparison. Technical report, 1995.

    Google Scholar 

  33. B. Negrevergne, A. Termier, J. Mehaut, and T. Uno. Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses. In High Performance Computing and Simulation (HPCS), 2010 International Conference on, pages 521–528, 2010.

    Google Scholar 

  34. Srinivasan Parthasarathy, Mohammed Javeed Zaki, Mitsunori Ogihara, and Wei Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems, 3(1):1–29, 2001.

    Article  Google Scholar 

  35. Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Discovering frequent closed itemsets for association rules. Database Theory – ICDT’99, pages 398–416, 1999.

    Google Scholar 

  36. Jian Pei, Jiawei Han, and Runying Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery, volume 4, pages 21–30, 2000.

    Google Scholar 

  37. Victor Podlozhnyuk. Histogram calculation on CUDA. http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/histogram64/doc/histogram.pdf.

  38. Anand Rajaraman, Jeffrey D Ullman, Jeffrey David Ullman, and Jeffrey David Ullman. Mining of massive datasets, volume 1. Cambridge University Press Cambridge, 2012.

    Google Scholar 

  39. Bart Goethals Roberto Bayardo and Mohammed J. Zaki, editors. Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004. http://www.ceur-ws.org/Vol-126/.

  40. Ken Satoh and Takeaki Uno. Enumerating maximal frequent sets using irredundant dualization. In Discovery Science, pages 256–268. Springer, 2003.

    Google Scholar 

  41. C. Silvestri and S. Orlando. gpuDCI: Exploiting GPUs in frequent itemset mining. In Parallel, Distributed and Network-Based Processing (PDP), 2012 20th Euromicro International Conference on, pages 416–425, Feb 2012.

    Google Scholar 

  42. Gerd Stumme, Rafik Taouil, Yves Bastide, Nicolas Pasquier, and Lotfi Lakhal. Computing iceberg concept lattices with titanic. Data & knowledge engineering, 42(2):189–222, 2002.

    Article  Google Scholar 

  43. G. Teodoro, N. Mariano, W. Meira, and R. Ferreira. Tree projection-based frequent itemset mining on multicore CPUs and GPUs. In Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on, pages 47–54, 2010.

    Google Scholar 

  44. Pawel Terlecki and Krzysztof Walczak. Jumping emerging patterns with negation in transaction databases–classification and discovery. Information Sciences, 177(24):5675–5690, 2007.

    Article  MathSciNet  Google Scholar 

  45. Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In Workshop on Frequent Itemset Mining Implementations, 2004.

    Google Scholar 

  46. Renato Vimieiro and Pablo Moscato. Mining disjunctive minimal generators with TitanicOR. Expert Systems with Applications, 39(9):8228–8238, 2012.

    Article  Google Scholar 

  47. Renato Vimieiro and Pablo Moscato. Disclosed: An efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Information Sciences, 280:171–187, 2014.

    Article  MathSciNet  Google Scholar 

  48. Renato Vimieiro and Pablo Moscato. A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Information Systems, 40:1–10, 2014.

    Article  Google Scholar 

  49. Mohammed J Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering, IEEE Transactions on, 12(3):372–390, 2000.

    Google Scholar 

  50. Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov., 1(4):343–373, December 1997.

    Article  Google Scholar 

  51. Mohammed J. Zaki and Jr. Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, May 2014.

    Google Scholar 

  52. Mohammed Javeed Zaki and Ching-Jiu Hsiao. Charm: An efficient algorithm for closed itemset mining. In SDM, volume 2, pages 457–473, 2002.

    Google Scholar 

  53. Fan Zhang, Yan Zhang, and J. Bakos. GPApriori: GPU-accelerated frequent itemset mining. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 590–594, Sept 2011.

    Google Scholar 

  54. Yan Zhang, Fan Zhang, and Jason Bakos. Frequent itemset mining on large-scale shared memory machines. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 585–589. IEEE, 2011.

    Google Scholar 

  55. Lizhuang Zhao, Mohammed J Zaki, and Naren Ramakrishnan. Blosom: a framework for mining arbitrary boolean expressions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 827–832. ACM, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Cafaro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cafaro, M., Pulimeno, M. (2019). Frequent Itemset Mining. In: Moscato, P., de Vries, N. (eds) Business and Consumer Analytics: New Ideas. Springer, Cham. https://doi.org/10.1007/978-3-030-06222-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-06222-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-06221-7

  • Online ISBN: 978-3-030-06222-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics