Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees

Coenen, Frans; Leng, Paul; Pagourtzis, Aris; Rytter, Wojciech; Souliou, Dora

doi:10.1007/978-1-84628-226-3_20

Frans Coenen⁴,
Paul Leng⁴,
Aris Pagourtzis⁵,
Wojciech Rytter⁶ &
…
Dora Souliou⁵

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

396 Accesses
1 Citations

Abstract

Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In previous work [9] we have introduced an approach to this problem which begins by carrying out an efficient partial computation of the necessary totals, storing these interim results in a set-enumeration tree. This work demonstrated that making use of this structure can significantly reduce the cost of determining the frequent sets.

In this paper we describe two algorithms for completing the calculation of frequent sets using an interim-support tree. These algorithms are improved versions of earlier algorithms described in the above mentioned work and in a consequent paper [7]. The first of our new algorithms (TTF) differs from its ancestor in that it uses a novel tree pruning technique, based on the notion of (fixed-prefix) potential inclusion, which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the interim-support tree in a space efficient manner. The second algorithm (PTF) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree (T-tree); the improvement lies in the use of multiple pointers per node which provides rapid access to the nodes of the T-tree and makes it possible to design a new, usually faster, method for updating them.

Experimental comparison shows that these improvements result in considerable speedup for both algorithms. Further comparison between the two improved algorithms, shows that PTF is generally faster on instances with a large number of frequent itemsets, while TTF is more appropriate whenever this number is small; in addition, TTF behaves quite well on instances in which the densities of the items of the database have a high variance.

Aris Pagourtzis and Dora Souliou were partially supported for this research by “Pythagoras” grant of the Hellenic Ministry of Education, co-funded by the European Social Fund (75%) and National Resources (25%) under Operational Programme “Education and Initial Vocational Training” (EPEAEK II).

Wojciech Rytter was supported for this research by grants 4T11C04425 and CCR-0313219.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

F. Angiulli, G. Ianni, L. Palopoli. On the complexity of inducing categorical and quantitative association rules, arXiv:cs.CC/0111009 vol. 1, Nov. 2001
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proc. of ACM SIGMOD Conference on Management of Data, Washington DC, May 1993.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, Dec 1993. Special Issue on Learning and Discovery in Knowledge-Based Databases.
Article Google Scholar
R. Agrawal and R. Srikant. Fast Algorithms for mining association rules. In VLDB’94, pp. 487–499.
Google Scholar
R. Agrawal, C. Aggarwal and V. Prasad. Depth First Generation of Long Patterns. In KDD 2000, ACM, pp. 108–118
Google Scholar
E. Boros, V. Gurvich, L. Khachiyan, K. Makino. On the complexity of generating maximal frequent and minimal infrequent sets, in STACS 2002.
Google Scholar
F. Coenen, G. Goulbourne, and P. Leng. Computing Association Rules using Partial Totals. In L. De Raedt and A. Siebes eds, Principles of Data Mining and Knowledge Discovery (Proc 5th European Conference, PKDD 2001, Freiburg, Sept 2001), Lecture Notes in AI 2168, Springer-Verlag, Berlin, Heidelberg: pp. 54–66.
Google Scholar
F. Coenen, G. Goulbourne and P. Leng. Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery, 8 (2004), pp. 25–51
Article MathSciNet Google Scholar
G. Goulbourne, F. Coenen and P. Leng. Algorithms for Computing Association Rules using a Partial-Support Tree. Journal of Knowledge-Based Systems 13 (2000), pp. 141–149.
Article Google Scholar
J. Han, J. Pei, Y. Yin and R. Mao. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 8 (2004), pp. 53–87
Article MathSciNet Google Scholar
A. Savasere, E. Omiecinski and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In VLDB 1995, pp. 432–444
Google Scholar
H. Toivonen. Sampling Large Databases for Association Rules. In VLDB 1996, pp. 1–12.
Google Scholar
M. J. Zaki. Generating Non-Redundant Association Rules. In Proc. SIGKDD-2000, pp. 34–43, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Liverpool Chadwick Building, Peach Street, L69 7ZF, UK
Frans Coenen & Paul Leng
Department of Computer Science, National Technical University of Athens, 15780, Zografou, Athens, Greece
Aris Pagourtzis & Dora Souliou
Institute of Informatics, Warsaw University, Poland and Department of Computer Science, New Jersey Institute of Technology, US
Wojciech Rytter

Authors

Frans Coenen
View author publications
You can also search for this author in PubMed Google Scholar
Paul Leng
View author publications
You can also search for this author in PubMed Google Scholar
Aris Pagourtzis
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Rytter
View author publications
You can also search for this author in PubMed Google Scholar
Dora Souliou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, FBCS, FIEE, FRSA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen PhD
Nottingham Trent University, UK
Tony Allen PhD

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coenen, F., Leng, P., Pagourtzis, A., Rytter, W., Souliou, D. (2006). Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXII. SGAI 2005. Springer, London. https://doi.org/10.1007/978-1-84628-226-3_20

Download citation

DOI: https://doi.org/10.1007/978-1-84628-226-3_20
Publisher Name: Springer, London
Print ISBN: 978-1-84628-225-6
Online ISBN: 978-1-84628-226-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics