Skip to main content

Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees

  • Conference paper
Research and Development in Intelligent Systems XXII (SGAI 2005)

Abstract

Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In previous work [9] we have introduced an approach to this problem which begins by carrying out an efficient partial computation of the necessary totals, storing these interim results in a set-enumeration tree. This work demonstrated that making use of this structure can significantly reduce the cost of determining the frequent sets.

In this paper we describe two algorithms for completing the calculation of frequent sets using an interim-support tree. These algorithms are improved versions of earlier algorithms described in the above mentioned work and in a consequent paper [7]. The first of our new algorithms (TTF) differs from its ancestor in that it uses a novel tree pruning technique, based on the notion of (fixed-prefix) potential inclusion, which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the interim-support tree in a space efficient manner. The second algorithm (PTF) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree (T-tree); the improvement lies in the use of multiple pointers per node which provides rapid access to the nodes of the T-tree and makes it possible to design a new, usually faster, method for updating them.

Experimental comparison shows that these improvements result in considerable speedup for both algorithms. Further comparison between the two improved algorithms, shows that PTF is generally faster on instances with a large number of frequent itemsets, while TTF is more appropriate whenever this number is small; in addition, TTF behaves quite well on instances in which the densities of the items of the database have a high variance.

Aris Pagourtzis and Dora Souliou were partially supported for this research by “Pythagoras” grant of the Hellenic Ministry of Education, co-funded by the European Social Fund (75%) and National Resources (25%) under Operational Programme “Education and Initial Vocational Training” (EPEAEK II).

Wojciech Rytter was supported for this research by grants 4T11C04425 and CCR-0313219.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Angiulli, G. Ianni, L. Palopoli. On the complexity of inducing categorical and quantitative association rules, arXiv:cs.CC/0111009 vol. 1, Nov. 2001

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proc. of ACM SIGMOD Conference on Management of Data, Washington DC, May 1993.

    Google Scholar 

  3. R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, Dec 1993. Special Issue on Learning and Discovery in Knowledge-Based Databases.

    Article  Google Scholar 

  4. R. Agrawal and R. Srikant. Fast Algorithms for mining association rules. In VLDB’94, pp. 487–499.

    Google Scholar 

  5. R. Agrawal, C. Aggarwal and V. Prasad. Depth First Generation of Long Patterns. In KDD 2000, ACM, pp. 108–118

    Google Scholar 

  6. E. Boros, V. Gurvich, L. Khachiyan, K. Makino. On the complexity of generating maximal frequent and minimal infrequent sets, in STACS 2002.

    Google Scholar 

  7. F. Coenen, G. Goulbourne, and P. Leng. Computing Association Rules using Partial Totals. In L. De Raedt and A. Siebes eds, Principles of Data Mining and Knowledge Discovery (Proc 5th European Conference, PKDD 2001, Freiburg, Sept 2001), Lecture Notes in AI 2168, Springer-Verlag, Berlin, Heidelberg: pp. 54–66.

    Google Scholar 

  8. F. Coenen, G. Goulbourne and P. Leng. Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery, 8 (2004), pp. 25–51

    Article  MathSciNet  Google Scholar 

  9. G. Goulbourne, F. Coenen and P. Leng. Algorithms for Computing Association Rules using a Partial-Support Tree. Journal of Knowledge-Based Systems 13 (2000), pp. 141–149.

    Article  Google Scholar 

  10. J. Han, J. Pei, Y. Yin and R. Mao. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 8 (2004), pp. 53–87

    Article  MathSciNet  Google Scholar 

  11. A. Savasere, E. Omiecinski and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In VLDB 1995, pp. 432–444

    Google Scholar 

  12. H. Toivonen. Sampling Large Databases for Association Rules. In VLDB 1996, pp. 1–12.

    Google Scholar 

  13. M. J. Zaki. Generating Non-Redundant Association Rules. In Proc. SIGKDD-2000, pp. 34–43, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag London Limited

About this paper

Cite this paper

Coenen, F., Leng, P., Pagourtzis, A., Rytter, W., Souliou, D. (2006). Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXII. SGAI 2005. Springer, London. https://doi.org/10.1007/978-1-84628-226-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-226-3_20

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-225-6

  • Online ISBN: 978-1-84628-226-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics