Skip to main content

Advertisement

Log in

Efficient mining frequent itemsets algorithms

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. It is well known that countTable is one of the most important facility to employ subsets property for compressing the transaction database to new lower representation of occurrences items. One of the biggest problem in this technique is the cost of candidate generation and test processing which are the two most important steps to find association rules. In this paper, we have developed this method to avoid the costly candidate-generation-and-test processing completely. Moreover, the proposed methods also compress crucial information about all itemsets, maximal length frequent itemsets, minimal length frequent itemsets, avoid expensive, and repeated database scans. The proposed named CountTableFI and BinaryCountTableF are presented, the algorithm has significant difference from the Apriori and all other algorithms extended from Apriori. The idea behind this algorithm is in the representation of the transactions, where, we represent all transactions in binary number and decimal number, so it is simple and fast to use subset and identical set properties. A comprehensive performance study shows that our techniques are efficient and scalable comparing with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Frawley W, Piatetsky-Shapiro G, Matheus C (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57–70

    Google Scholar 

  2. Han J, Kamber M (2006) Data mining: concepts and techniques. 2nd edn. Morgan Kaufmann, San Francisco

  3. Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in large databases. In: Proceedings of the ACM-SIGMOD 1993 international conference on management of data. Washington D.C., USA

  4. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases. Chile, pp 487–499

  5. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 1995 international conference very large data bases (VLDB’95), Zurich, Switzerland, pp 432–443

  6. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of 1995 ACM-SIGMOD internationa conference management of data (SIGMOD’95), San Jose, pp 175–186

  7. Lent B, Swami A, Widom J (1997) Clustering association rules. In Proc. 1997 Int. Conf. Data Engineering (ICDE’97), 220–231, Birmingham, England

  8. Pei J (2002) Pattern-grouth methods for frequent pattern mining. Ph.D. Thesis

  9. Gouda K, Zaki MJ (2005) GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Discov 11(3):223–242

    Article  MathSciNet  Google Scholar 

  10. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  MathSciNet  Google Scholar 

  11. Goethals B (2003) Survey on frequent pattern mining. Techinqcal report

  12. Ceglar A, Roddick JF (2006) Association mining. ACM Computing Surveys 38(2), Article 5

  13. Zhao Q, Bhowmick SS (2003) Association rule mining: a survey. Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116

  14. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the third international conference on knowledge discovery and data mining, AAAI Press, pp 283–286

  15. Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD conference management of data

  16. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD conference on management of data. ACM, Dallas, pp 1–12

  17. Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 326–335

  18. Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl Based Syst 20(4):329–335

    Article  Google Scholar 

  19. Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl Based Syst 21:507–513

    Article  Google Scholar 

  20. Zaki MJ (1999) Parallel and distributed assiocation mining: a survey. IEEE Concurr 7(4):14–25

    Article  Google Scholar 

  21. Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 1996 international conference very large data bases (VLDB’96). Bombay, India, pp 134–145

  22. Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) Hmine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of IEEE international conference on data mining, pp 441–448

  23. Pietracaprina A, Zandolin D (2003) Mining frequent itemsets using Patricia Tries. FIMI ’03, frequent itemset mining implementations. In: Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations. Melbourne

  24. Tsay YJ, Chiang JY (2005) CBAR: an efficient method for mining association rules. Knowl Based Syst 18(2–3):99–105

    Article  Google Scholar 

  25. Tsay YJ, Chiang JY (2004) CDAR: an efficient cluster and decomposition algorithm for mining association rules. Inform Sci 160:161–171

    Article  Google Scholar 

  26. (2004) Workshop on freqent itemset mining implementations (FIMI’04). http://fimi.cs.helsinki.fi

  27. Bayarda RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data. Seattle, WA, pp 85–93

  28. Lin DI, Kedem ZM (1997) Pincer-search: a new algorithm for discovering the maximum frequent set. In Schek H, Saltor F, Ramos I, Alonso G (eds). Proceedings of advances in database technology (EDBT ’98), 6th international conference on extending database technology, Valencia, Spain. Lecture Notes in Computer Science, 1377. Springer, Berlin, pp 105–119

  29. Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5)

  30. Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Discov 9(3):223–248

    Article  MathSciNet  Google Scholar 

  31. Wang HX (2004) Demand-driven frequent itemset mining using pattern structures. Knowl Inform Syst 8(1):82–102

    Article  Google Scholar 

  32. Agarwal R, Aggarwal C, Prasad VVV (2000) A tree projection algorithm for generation of frequent itemsets. In J Parallel Distrib Comput (Special Issue on High Performance Data Mining) 61:350–371

  33. Liu G, Lu H, Lou W, Xu Y, Yu JX (2004) Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Min Knowl Discov 9(3):249–274

    Article  MathSciNet  Google Scholar 

  34. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-Trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Article  Google Scholar 

  35. Gopalan R, Sucahyo YG (2002) ITL-Mine: mining frequent itemsets more efficiently. In: Proceedings of 2002 international conference on fuzzy systems and knowledge fiscovery, Singapore

  36. Gopalan R, Sucahyo YG (2002) TreeITL-Mine: mining frequent itemsets using pattern growth, tid intersection and prefix tree. In: Proceedings of 15th Australian joint conference on artificial intelligence, Canberra, Australia. Lecture Notes on Artificial Intelligence, 2557. Springer, Melbourne

  37. Sucahyo YG, Gopalan R (2003) CT-ITL: Efficient frequent item set mining using a compressed prefix tree with pattern growth. In; Proceedings of 14th Australasian database conference, Adelaide, Australia

  38. Gopalan R, Sucahyo YG (2003) Fast Frequent itemset mining using compressed data representation. In: Proceedings of IASTED international conference on databases and applications (DBA’2003). Innsbruck, Austria, Feb 10–13

  39. Gopalan R, Sucahyo YG (2003) Improving the efficiency of frequent pattern mining by compact data structure design. In: Proceedings of fourth international conference on intelligent data engineering and automated learning (IDEAL). Hong Kong, March 21–23, LNCS, Springer

  40. Gopalan R, Sucahyo YG (2004) High performance frequent patterns extraction using compressed FP-Tree. In: Proceedings of the SIAM international workshop on high performance and distributed mining. Orlando, USA

  41. Sucahyo YG, Gopalan R (2004) CT-PRO: A bottom–up non recursive frequent itemset mining algorithm using compressed FP-Tree data structure. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI). Brighton, UK

  42. Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets, FIMI ’03, Frequent Itemset Mining Implementations. In: Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations. Melbourne

  43. Song M, Rajasekaran S (2006) A transaction mapping algorithm for frequent itemsets mining. IEEE Trans Knowl Data Eng 18(4): 472–481

    Google Scholar 

  44. Holt JD, Chung SM (2002) Mining association rules using inverted hashing and pruning. Inform Process Lett 83(4):211–220

    Article  MATH  MathSciNet  Google Scholar 

  45. Ahmed S, Coenen F, Leng P (2006) Tree-based partitioning of data for association rule mining. Knowl Inf Syst 10(3):315–331

    Article  Google Scholar 

  46. Wang T, He P (2006) Database encoding and a new algorithm for association rules mining. J Commun Comput 3(3):77–81

    Google Scholar 

  47. Fu-zan C, Min-qiang L (2008) Efficient algorithm based on itemset-lattice and bitmap index for finding frequent itemsets. Syst Eng Theory Prac 28(2):26–34

    Google Scholar 

  48. Fakhrahmad SM, Zolghadr Jahromi M, Sadreddini MH (2007) Mining frequent itemsets in large data warehouses: a novel approach proposed for sparse data sets. In: Yin H et al (eds) IDEAL. LNCS, 4881, pp 517–526

  49. Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Int J Mach Learn Cybern 2(3):135–145

    Article  Google Scholar 

  50. Gu S-M, Wu W-Z (2012) On knowledge acquisition in multiscale decision systems. Int J Mach Learn Cybern :1–10. doi:10.1007/s13042-012-0115-7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marghny H. Mohamed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohamed, M.H., Darwieesh, M.M. Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. & Cyber. 5, 823–833 (2014). https://doi.org/10.1007/s13042-013-0172-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0172-6

Keywords