CanTree: a canonical-order tree for incremental frequent-pattern mining

Leung, Carson Kai-Sang; Khan, Quamrul I.; Li, Zhan; Hoque, Tariqul

doi:10.1007/s10115-006-0032-8

CanTree: a canonical-order tree for incremental frequent-pattern mining

Regular Paper
Published: 05 October 2006

Volume 11, pages 287–311, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Carson Kai-Sang Leung¹,
Quamrul I. Khan¹,
Zhan Li¹ &
…
Tariqul Hoque¹

483 Accesses
110 Citations
Explore all metrics

Abstract

Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating. Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern mining. In this paper, we propose a novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order. By exploiting its nice properties, the CanTree can be easily maintained when database transactions are inserted, deleted, and/or modified. For example, the CanTree does not require adjustment, merging, and/or splitting of tree nodes during maintenance. No rescan of the entire updated database or reconstruction of a new tree is needed for incremental updating. Experimental results show the effectiveness of our CanTree in the incremental mining of frequent patterns. Moreover, the applicability of CanTrees is not confined to incremental mining; CanTrees can also be applicable to other frequent-pattern mining tasks including constrained mining and interactive mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pattern-Growth Methods

Efficient Single Pass Ordered Incremental Pattern Mining

Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the SIGMOD 1993. ACM Press, New York, pp 207–216
Chapter Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the VLDB 1994. Morgan Kaufmann, San Francisco, CA, pp 487–499
Google Scholar
Ayan NF, Tansel AU, Arkun E (1999) An efficient algorithm to update large itemsets with early pruning. In: Fayyad U, Chaudhuri S, Madigan D (eds) Proceedings of the SIGKDD 1999. ACM Press, New York, pp 287–291 Chairmen: Fayyad U, Chaudhuri S, Madigan D Proceedings Chair: Shim K
Google Scholar
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 85–93
Chapter Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California – Irvine, Irvine, CA
Google Scholar
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. KAIS 8(2):131–153
Article Google Scholar
Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Rastogi R, Morik K, Bramer M, Wu X (eds) Proceedings of the ICDM 2004. IEEE Computer Society Press, Los Alamitos, CA, pp 35–42
Google Scholar
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Peckham J (ed) Proceedings of the SIGMOD 1997. ACM Press, New York, pp 265–276
Chapter Google Scholar
Bucila C, Gehrke J, Kifer D, White WM (2002) DualMiner: a dual-pruning algorithm for itemsets with constraints. In: Zaïane OR, Goebel R, Hand D, et al (eds) Proceedings of the SIGKDD 2002. ACM Press, New York, pp 42–51
Google Scholar
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Su SYW (ed) Proceedings of the ICDE 1996. IEEE Computer Society Press, Los Alamitos, CA, pp 106–114
Google Scholar
Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Topor RW, Tanaka K (eds) Proceedings of the DASFAA 1997. World Scientific, Singapore, pp 185–194
Google Scholar
Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constraint. In: Desai BC, Ng W (eds) Proceedings of the IDEAS 2003. IEEE Computer Society Press, Los Alamitos, CA, pp 111–116
Google Scholar
Coatney M, Parthasarathy S (2005) MotifMiner: efficient discovery of common substructures in biochemical molecules. KAIS 7(2):202–223
Article Google Scholar
Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization. In: Jagadish HV, Mumick IS (eds) Proceedings of the SIGMOD 1996. ACM Press, New York, pp 13–23
Chapter Google Scholar
Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the SIGKDD 2004. ACM Press, New York, pp 138–147
Google Scholar
Goethals B, Zaki MJ (2003) Advances in frequent itemset mining implementations: introduction to FIMI'03. In: Goethals B, Zaki MJ (eds) Proceedings of the FIMI 2003. Available via CEUR-WS.org
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the SIGMOD 2000. ACM Press, New York, pp 1–12
Chapter Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowledge Dis 8(1):53–87
Article MathSciNet Google Scholar
Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings of the SIGMOD 1999. ACM Press, New York, pp 145–156
Chapter Google Scholar
Huang H, Wu X, Relue R (2002) Association analysis with one scan of databases. In: Kumar V, Tsumoto S, Zhong N, et al (eds) Proceedings of the ICDM 2002. IEEE Computer Society Press, Los Alamitos, CA, pp 629–632 In: Kumar V, Tsumoto S, Zhong N, Yu PS, Wu X (eds)
Chapter Google Scholar
Koh J-L, Shieh S-F (2004) An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: Lee Y-J, Li J, Whang K-Y, Lee D (eds) Proceedings of the DASFAA 2004. Springer-Verlag, Berlin Heidelberg New York, pp 417–424
Google Scholar
Lakshmanan LVS, Leung CK-S, Ng RT (2003) Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4):337–389
Article Google Scholar
Leung CK-S (2004) Interactive constrained frequent-pattern mining system. In: Bernardino J, Desai BC (eds) Proceedings of the IDEAS 2004. IEEE Computer Society Press, Los Alamitos, CA, pp 49–58
Google Scholar
Leung CK-S, Khan QI, Hoque T (2005) CanTree: a tree structure for efficient incremental mining of frequent patterns. In: Han J, Wah BW, Raghavan V, et al (eds) Proceedings of the ICDM 2005. IEEE Computer Society Press, Los Alamitos, CA, pp 274–281 In: Han J, Wah BW, Raghavan V, Wu X, Rastogi R (eds)
Google Scholar
Leung CK-S, Lakshmanan LVS, Ng RT (2002) Exploiting succinct constraints using FP-trees. SIGKDD Explorat 4(1):40–49
Article Google Scholar
Leung CK-S, Ng RT, Mannila H (2002) OSSM: a segmentation approach to optimize frequency counting. In: Agrawal R, Dittrich K, Ngu AHH (eds) Proceedings of the ICDE 2002. IEEE Computer Society Press, Los Alamitos, CA, pp 583–592
Google Scholar
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 13–24
Chapter Google Scholar
Ong K-L, Ng WK, Lim E-P (2003) FSSM: fast construction of the optimized segment support map. In: Kambayashi Y, Mohania MK, Wöss W (eds) Proceedings of the DaWaK 2003. Springer-Verlag, Berlin Heidelberg New York, pp 257–266
Google Scholar
Park JS, Chen M-S, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE TKDE 9(5):813–825
Google Scholar
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Buchmann A, Georgakopoulos D (eds) Proceedings of the ICDE 2001. IEEE Computer Society Press, Los Alamitos, CA, pp 433–442
Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, Rastogi R (eds) Proceedings of the DMKD 2000, pp 21–30 (the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery) is Available via www.cs.ucr.edu/~dg/DMKD.html
Pietracaprina A, Zandolin D (2003) Mining frequent itemsets using Patricia tries. In: Goethals B, Zaki MJ (eds) Proceedings of the FIMI 2003. Available via CEUR-WS.org
Sarawagi S, Thomas S, Agrawal R (1998) Integrating association rule mining with relational database systems: alternatives and implications. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 343–354
Chapter Google Scholar
Teng W-G, Hsieh M-J, Chen M-S (2005) A statistical framework for mining substitution rules. KAIS 7(2):158–178
Article Google Scholar
Tsur D, Ullman JD, Abiteboul S, et al (1998) Query flocks: a generalization of association-rule mining. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 1–12 Tsur D, Ullman JD, Abiteboul S, Clifton C, Motwani R, Nestorov S, Rosenthal A (1998)
Chapter Google Scholar
Tzvetkov P, Yan X, Han J (2005) TSP: mining top-k closed sequential patterns. KAIS 7(4):438–457
Article Google Scholar
Wang W, Yang J, Yu P (2004) WAR: weighted association rules for item intensities. KAIS 6(2):203–229
Article Google Scholar
Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, et al (eds) Proceedings of the SDM 2002. SIAM, Philadelphia, PA, pp 457–473
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada, R3T 2N2
Carson Kai-Sang Leung, Quamrul I. Khan, Zhan Li & Tariqul Hoque

Authors

Carson Kai-Sang Leung
View author publications
You can also search for this author in PubMed Google Scholar
Quamrul I. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Li
View author publications
You can also search for this author in PubMed Google Scholar
Tariqul Hoque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carson Kai-Sang Leung.

Additional information

Carson K.-S. Leung received his B.Sc.(Honours), M.Sc., and Ph.D. degrees, all in computer science, from the University of British Columbia, Canada. Currently, he is an Assistant Professor at the University of Manitoba, Canada. His research interests include the areas of databases, data mining, and data warehousing. His work has been published in refereed journals and conferences such as ACM Transactions on Database Systems (TODS), IEEE International Conference on Data Engineering (ICDE), and IEEE International Conference on Data Mining (ICDM)

Quamrul I. Khan received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. He then worked as a Test Engineer and a Software Engineer for a few years before he started his current M.Sc. degree program in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.

Zhan Li received her B.Eng. degree in computer engineering from Harbin Engineering University, China, in 2002. Currently, she is pursuing her M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.

Tariqul Hoque received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. Currently, he is pursuing his M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leung, C.KS., Khan, Q.I., Li, Z. et al. CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11, 287–311 (2007). https://doi.org/10.1007/s10115-006-0032-8

Download citation

Received: 30 November 2005
Revised: 23 January 2006
Accepted: 01 April 2006
Published: 05 October 2006
Issue Date: April 2007
DOI: https://doi.org/10.1007/s10115-006-0032-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CanTree: a canonical-order tree for incremental frequent-pattern mining

Abstract

Access this article

Similar content being viewed by others

Pattern-Growth Methods

Efficient Single Pass Ordered Incremental Pattern Mining

Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CanTree: a canonical-order tree for incremental frequent-pattern mining

Abstract

Access this article

Similar content being viewed by others

Pattern-Growth Methods

Efficient Single Pass Ordered Incremental Pattern Mining

Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation