Abstract
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM-SIGMOD, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of VLDB, pp 487–99
Aouad LM, Le-Khac NA, Kechadi TM (2010) Performance study of distributed Apriori-like frequent itemsets mining. Knowl Inf Syst 23: 55–72. doi:10.1007/s10115-009-0205-3
Bayardo RJ (1998) Efficient mining long patterns from databases (1998). In: Proceedings of ACM SIGMOD international conference on management of data, pp 85–93
Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceedings of ICDM, pp 35–42
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of ICDE, pp 106–114
Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Proceedings of DASFAA, pp 185–194
Cheung W, Zaïane, OR (2003) Incremental mining of frequent patterns without candidate gneration or support constraint. In: Proceedings of international database engineering and applications symposium, pp 111–116
Vishnu Priya R, Vadivel A, Thakur RS (2010) Frequent pattern mining using modified CP-tree for knowledge discovery in Springer, Berlin, Heidelberg. LNCS 2010, 6440:254–261. doi:10.1007/978-3-642-17316-5-24
Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16: 1–2. doi:10.1007/s10115-007-0092-4
Chung SM, Luo C (2008) Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowl Inf Syst 16: 359–391. doi:10.1007/s10115-007-0115-1
García-Pedrajas N, de Haro-Garcí A (2012) Scaling up data mining algorithms: review and taxonomy. Prog Artif Intell 1:71–87. doi: 10.1007/s13748-011-0004-4 (published online: 13 Jan 2012) 13.
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD 2000, pp 1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1): 53–87
Koh YS, Dobbie G (2011) SPO-tree: efficient single pass ordered incremental pattern mining. Springer, Berlin, Heidelberg, LNCS 2011, vol 6862, pp 265–276. doi:10.1007/978-3-642-23544-3-20
Leung CK-S, Khan QI, Li Z, Hoque T (2006) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3): 287–311. doi:10.1007/s10115-006-0032-8
Lin C-W, Hong T-P, Lu W-H (2010) Using the structure of prelarge trees to incrementally mine frequent itemset. New Gener Comput 28(1): 5–20. doi:10.1007/s00354-008-0072-6
Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst. doi:10.1007/s10115-011-0419-z (published online: 05 June 2012)
Li J, Zou Z, Gao H (2012) Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. doi:10.1007/s00778-012-0268-8 (published online: 28 Feb 2012)
Lee C-H, Lin C-R, Chen MS (2005) Slid-ing window filtering: an efficient method for incremental mining on a time-variant database. ELSEVIER-Inf Syst 30(3): 227–244
Leung CK-S, Khan QI, Hoque T et al (2005) CanTree: a tree structure for efficient incremental mining of frequent patterns. In Proceedings of IEEE international conference on data mining (ICDM’05)
Leung CK, Khan QI, Li Z et al (2007) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3): 287–311
Shelokar P, Quirin A, Cordón O (2011) MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst. doi:10.1007/s10115-011-0452-y (published online: 17 Nov 2011)
Tanbeer SK, Ahmed CF, Jeong BS et al (2008) Efficient single-pass frequent pattern mining using a prefix-tree. Elsevier Inc J Inf Sci, pp 259–283. doi:10.1016/j.ins.2008.10.027
Totad SG, Geeta RB, Prasad Reddy PVGD (2010) Batch processing for incremental FP-tree construction. Int J Comput Appl IJCA 5(5): 28–32
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Totad, S.G., Geeta, R.B. & Prasad Reddy, P.V.G.D. Batch incremental processing for FP-tree construction using FP-Growth algorithm. Knowl Inf Syst 33, 475–490 (2012). https://doi.org/10.1007/s10115-012-0514-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0514-9