Abstract
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. It is well known that countTable is one of the most important facility to employ subsets property for compressing the transaction database to new lower representation of occurrences items. One of the biggest problem in this technique is the cost of candidate generation and test processing which are the two most important steps to find association rules. In this paper, we have developed this method to avoid the costly candidate-generation-and-test processing completely. Moreover, the proposed methods also compress crucial information about all itemsets, maximal length frequent itemsets, minimal length frequent itemsets, avoid expensive, and repeated database scans. The proposed named CountTableFI and BinaryCountTableF are presented, the algorithm has significant difference from the Apriori and all other algorithms extended from Apriori. The idea behind this algorithm is in the representation of the transactions, where, we represent all transactions in binary number and decimal number, so it is simple and fast to use subset and identical set properties. A comprehensive performance study shows that our techniques are efficient and scalable comparing with other methods.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Frawley W, Piatetsky-Shapiro G, Matheus C (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57–70
Han J, Kamber M (2006) Data mining: concepts and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in large databases. In: Proceedings of the ACM-SIGMOD 1993 international conference on management of data. Washington D.C., USA
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases. Chile, pp 487–499
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 1995 international conference very large data bases (VLDB’95), Zurich, Switzerland, pp 432–443
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceedings of 1995 ACM-SIGMOD internationa conference management of data (SIGMOD’95), San Jose, pp 175–186
Lent B, Swami A, Widom J (1997) Clustering association rules. In Proc. 1997 Int. Conf. Data Engineering (ICDE’97), 220–231, Birmingham, England
Pei J (2002) Pattern-grouth methods for frequent pattern mining. Ph.D. Thesis
Gouda K, Zaki MJ (2005) GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Discov 11(3):223–242
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Goethals B (2003) Survey on frequent pattern mining. Techinqcal report
Ceglar A, Roddick JF (2006) Association mining. ACM Computing Surveys 38(2), Article 5
Zhao Q, Bhowmick SS (2003) Association rule mining: a survey. Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the third international conference on knowledge discovery and data mining, AAAI Press, pp 283–286
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD conference management of data
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD conference on management of data. ACM, Dallas, pp 1–12
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 326–335
Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl Based Syst 20(4):329–335
Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl Based Syst 21:507–513
Zaki MJ (1999) Parallel and distributed assiocation mining: a survey. IEEE Concurr 7(4):14–25
Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 1996 international conference very large data bases (VLDB’96). Bombay, India, pp 134–145
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) Hmine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of IEEE international conference on data mining, pp 441–448
Pietracaprina A, Zandolin D (2003) Mining frequent itemsets using Patricia Tries. FIMI ’03, frequent itemset mining implementations. In: Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations. Melbourne
Tsay YJ, Chiang JY (2005) CBAR: an efficient method for mining association rules. Knowl Based Syst 18(2–3):99–105
Tsay YJ, Chiang JY (2004) CDAR: an efficient cluster and decomposition algorithm for mining association rules. Inform Sci 160:161–171
(2004) Workshop on freqent itemset mining implementations (FIMI’04). http://fimi.cs.helsinki.fi
Bayarda RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data. Seattle, WA, pp 85–93
Lin DI, Kedem ZM (1997) Pincer-search: a new algorithm for discovering the maximum frequent set. In Schek H, Saltor F, Ramos I, Alonso G (eds). Proceedings of advances in database technology (EDBT ’98), 6th international conference on extending database technology, Valencia, Spain. Lecture Notes in Computer Science, 1377. Springer, Berlin, pp 105–119
Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5)
Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Discov 9(3):223–248
Wang HX (2004) Demand-driven frequent itemset mining using pattern structures. Knowl Inform Syst 8(1):82–102
Agarwal R, Aggarwal C, Prasad VVV (2000) A tree projection algorithm for generation of frequent itemsets. In J Parallel Distrib Comput (Special Issue on High Performance Data Mining) 61:350–371
Liu G, Lu H, Lou W, Xu Y, Yu JX (2004) Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Min Knowl Discov 9(3):249–274
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-Trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Gopalan R, Sucahyo YG (2002) ITL-Mine: mining frequent itemsets more efficiently. In: Proceedings of 2002 international conference on fuzzy systems and knowledge fiscovery, Singapore
Gopalan R, Sucahyo YG (2002) TreeITL-Mine: mining frequent itemsets using pattern growth, tid intersection and prefix tree. In: Proceedings of 15th Australian joint conference on artificial intelligence, Canberra, Australia. Lecture Notes on Artificial Intelligence, 2557. Springer, Melbourne
Sucahyo YG, Gopalan R (2003) CT-ITL: Efficient frequent item set mining using a compressed prefix tree with pattern growth. In; Proceedings of 14th Australasian database conference, Adelaide, Australia
Gopalan R, Sucahyo YG (2003) Fast Frequent itemset mining using compressed data representation. In: Proceedings of IASTED international conference on databases and applications (DBA’2003). Innsbruck, Austria, Feb 10–13
Gopalan R, Sucahyo YG (2003) Improving the efficiency of frequent pattern mining by compact data structure design. In: Proceedings of fourth international conference on intelligent data engineering and automated learning (IDEAL). Hong Kong, March 21–23, LNCS, Springer
Gopalan R, Sucahyo YG (2004) High performance frequent patterns extraction using compressed FP-Tree. In: Proceedings of the SIAM international workshop on high performance and distributed mining. Orlando, USA
Sucahyo YG, Gopalan R (2004) CT-PRO: A bottom–up non recursive frequent itemset mining algorithm using compressed FP-Tree data structure. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI). Brighton, UK
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets, FIMI ’03, Frequent Itemset Mining Implementations. In: Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations. Melbourne
Song M, Rajasekaran S (2006) A transaction mapping algorithm for frequent itemsets mining. IEEE Trans Knowl Data Eng 18(4): 472–481
Holt JD, Chung SM (2002) Mining association rules using inverted hashing and pruning. Inform Process Lett 83(4):211–220
Ahmed S, Coenen F, Leng P (2006) Tree-based partitioning of data for association rule mining. Knowl Inf Syst 10(3):315–331
Wang T, He P (2006) Database encoding and a new algorithm for association rules mining. J Commun Comput 3(3):77–81
Fu-zan C, Min-qiang L (2008) Efficient algorithm based on itemset-lattice and bitmap index for finding frequent itemsets. Syst Eng Theory Prac 28(2):26–34
Fakhrahmad SM, Zolghadr Jahromi M, Sadreddini MH (2007) Mining frequent itemsets in large data warehouses: a novel approach proposed for sparse data sets. In: Yin H et al (eds) IDEAL. LNCS, 4881, pp 517–526
Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Int J Mach Learn Cybern 2(3):135–145
Gu S-M, Wu W-Z (2012) On knowledge acquisition in multiscale decision systems. Int J Mach Learn Cybern :1–10. doi:10.1007/s13042-012-0115-7
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohamed, M.H., Darwieesh, M.M. Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. & Cyber. 5, 823–833 (2014). https://doi.org/10.1007/s13042-013-0172-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-013-0172-6