Abstract
Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In previous work [9] we have introduced an approach to this problem which begins by carrying out an efficient partial computation of the necessary totals, storing these interim results in a set-enumeration tree. This work demonstrated that making use of this structure can significantly reduce the cost of determining the frequent sets.
In this paper we describe two algorithms for completing the calculation of frequent sets using an interim-support tree. These algorithms are improved versions of earlier algorithms described in the above mentioned work and in a consequent paper [7]. The first of our new algorithms (TTF) differs from its ancestor in that it uses a novel tree pruning technique, based on the notion of (fixed-prefix) potential inclusion, which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the interim-support tree in a space efficient manner. The second algorithm (PTF) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree (T-tree); the improvement lies in the use of multiple pointers per node which provides rapid access to the nodes of the T-tree and makes it possible to design a new, usually faster, method for updating them.
Experimental comparison shows that these improvements result in considerable speedup for both algorithms. Further comparison between the two improved algorithms, shows that PTF is generally faster on instances with a large number of frequent itemsets, while TTF is more appropriate whenever this number is small; in addition, TTF behaves quite well on instances in which the densities of the items of the database have a high variance.
Aris Pagourtzis and Dora Souliou were partially supported for this research by “Pythagoras” grant of the Hellenic Ministry of Education, co-funded by the European Social Fund (75%) and National Resources (25%) under Operational Programme “Education and Initial Vocational Training” (EPEAEK II).
Wojciech Rytter was supported for this research by grants 4T11C04425 and CCR-0313219.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
F. Angiulli, G. Ianni, L. Palopoli. On the complexity of inducing categorical and quantitative association rules, arXiv:cs.CC/0111009 vol. 1, Nov. 2001
R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proc. of ACM SIGMOD Conference on Management of Data, Washington DC, May 1993.
R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, Dec 1993. Special Issue on Learning and Discovery in Knowledge-Based Databases.
R. Agrawal and R. Srikant. Fast Algorithms for mining association rules. In VLDB’94, pp. 487–499.
R. Agrawal, C. Aggarwal and V. Prasad. Depth First Generation of Long Patterns. In KDD 2000, ACM, pp. 108–118
E. Boros, V. Gurvich, L. Khachiyan, K. Makino. On the complexity of generating maximal frequent and minimal infrequent sets, in STACS 2002.
F. Coenen, G. Goulbourne, and P. Leng. Computing Association Rules using Partial Totals. In L. De Raedt and A. Siebes eds, Principles of Data Mining and Knowledge Discovery (Proc 5th European Conference, PKDD 2001, Freiburg, Sept 2001), Lecture Notes in AI 2168, Springer-Verlag, Berlin, Heidelberg: pp. 54–66.
F. Coenen, G. Goulbourne and P. Leng. Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery, 8 (2004), pp. 25–51
G. Goulbourne, F. Coenen and P. Leng. Algorithms for Computing Association Rules using a Partial-Support Tree. Journal of Knowledge-Based Systems 13 (2000), pp. 141–149.
J. Han, J. Pei, Y. Yin and R. Mao. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 8 (2004), pp. 53–87
A. Savasere, E. Omiecinski and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In VLDB 1995, pp. 432–444
H. Toivonen. Sampling Large Databases for Association Rules. In VLDB 1996, pp. 1–12.
M. J. Zaki. Generating Non-Redundant Association Rules. In Proc. SIGKDD-2000, pp. 34–43, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag London Limited
About this paper
Cite this paper
Coenen, F., Leng, P., Pagourtzis, A., Rytter, W., Souliou, D. (2006). Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXII. SGAI 2005. Springer, London. https://doi.org/10.1007/978-1-84628-226-3_20
Download citation
DOI: https://doi.org/10.1007/978-1-84628-226-3_20
Publisher Name: Springer, London
Print ISBN: 978-1-84628-225-6
Online ISBN: 978-1-84628-226-3
eBook Packages: Computer ScienceComputer Science (R0)