Skip to main content
Log in

Batch incremental processing for FP-tree construction using FP-Growth algorithm

  • Short Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM-SIGMOD, pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of VLDB, pp 487–99

  3. Aouad LM, Le-Khac NA, Kechadi TM (2010) Performance study of distributed Apriori-like frequent itemsets mining. Knowl Inf Syst 23: 55–72. doi:10.1007/s10115-009-0205-3

    Article  Google Scholar 

  4. Bayardo RJ (1998) Efficient mining long patterns from databases (1998). In: Proceedings of ACM SIGMOD international conference on management of data, pp 85–93

  5. Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceedings of ICDM, pp 35–42

  6. Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of ICDE, pp 106–114

  7. Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Proceedings of DASFAA, pp 185–194

  8. Cheung W, Zaïane, OR (2003) Incremental mining of frequent patterns without candidate gneration or support constraint. In: Proceedings of international database engineering and applications symposium, pp 111–116

  9. Vishnu Priya R, Vadivel A, Thakur RS (2010) Frequent pattern mining using modified CP-tree for knowledge discovery in Springer, Berlin, Heidelberg. LNCS 2010, 6440:254–261. doi:10.1007/978-3-642-17316-5-24

  10. Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16: 1–2. doi:10.1007/s10115-007-0092-4

    Article  MathSciNet  Google Scholar 

  11. Chung SM, Luo C (2008) Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowl Inf Syst 16: 359–391. doi:10.1007/s10115-007-0115-1

    Article  Google Scholar 

  12. García-Pedrajas N, de Haro-Garcí A (2012) Scaling up data mining algorithms: review and taxonomy. Prog Artif Intell 1:71–87. doi: 10.1007/s13748-011-0004-4 (published online: 13 Jan 2012) 13.

    Google Scholar 

  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD 2000, pp 1–12

  14. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1): 53–87

    Article  MathSciNet  Google Scholar 

  15. Koh YS, Dobbie G (2011) SPO-tree: efficient single pass ordered incremental pattern mining. Springer, Berlin, Heidelberg, LNCS 2011, vol 6862, pp 265–276. doi:10.1007/978-3-642-23544-3-20

  16. Leung CK-S, Khan QI, Li Z, Hoque T (2006) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3): 287–311. doi:10.1007/s10115-006-0032-8

    Article  Google Scholar 

  17. Lin C-W, Hong T-P, Lu W-H (2010) Using the structure of prelarge trees to incrementally mine frequent itemset. New Gener Comput 28(1): 5–20. doi:10.1007/s00354-008-0072-6

    Article  MATH  Google Scholar 

  18. Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst. doi:10.1007/s10115-011-0419-z (published online: 05 June 2012)

  19. Li J, Zou Z, Gao H (2012) Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. doi:10.1007/s00778-012-0268-8 (published online: 28 Feb 2012)

  20. Lee C-H, Lin C-R, Chen MS (2005) Slid-ing window filtering: an efficient method for incremental mining on a time-variant database. ELSEVIER-Inf Syst 30(3): 227–244

    Google Scholar 

  21. Leung CK-S, Khan QI, Hoque T et al (2005) CanTree: a tree structure for efficient incremental mining of frequent patterns. In Proceedings of IEEE international conference on data mining (ICDM’05)

  22. Leung CK, Khan QI, Li Z et al (2007) CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3): 287–311

    Article  Google Scholar 

  23. Shelokar P, Quirin A, Cordón O (2011) MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst. doi:10.1007/s10115-011-0452-y (published online: 17 Nov 2011)

  24. Tanbeer SK, Ahmed CF, Jeong BS et al (2008) Efficient single-pass frequent pattern mining using a prefix-tree. Elsevier Inc J Inf Sci, pp 259–283. doi:10.1016/j.ins.2008.10.027

  25. Totad SG, Geeta RB, Prasad Reddy PVGD (2010) Batch processing for incremental FP-tree construction. Int J Comput Appl IJCA 5(5): 28–32

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashikumar G. Totad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Totad, S.G., Geeta, R.B. & Prasad Reddy, P.V.G.D. Batch incremental processing for FP-tree construction using FP-Growth algorithm. Knowl Inf Syst 33, 475–490 (2012). https://doi.org/10.1007/s10115-012-0514-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0514-9

Keywords

Navigation