Abstract
The problem of extracting association rules from databases is well known. The most demanding part of the problem is the determination of the support for all those sets of attributes which occur often enough to be of possible interest. We have previously described methods we have developed that approach the problem by first constructing a tree (the P-tree) that contains a record of all the relevant information in the database and a partial computation of the support totals. This approach offers significant performance advantages over comparable alternative methods, which we have demonstrated experimentally with store-resident datasets. In practice, however, the real focus of interest is on much larger databases. In this paper we discuss strategies for partitioning the data in these cases, and present results of the performance analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, R., Aggarwal, C. and Prasad, V. Depth First Generation of Long Patterns. In Proc. of the ACM KDD Conference on Management of DataBoston, pages 108–118, 2000.
Agrawal, R., Imielinski, T. and Swami, A. Mining Association Rules between Sets of Items in Large Databases. In Proc. of the ACM SIGMOD Conference on Management of DataWashington, D.C., pages 207–216, May 1993.
Agrawal, R. and Srikant, R. Fast Algorithm for Mining Association Rules. In Proc. of the 20th VLDB Conference, Santiago, Santiago, Chile, pages 487–499, September 1994.
Bayardo, R.J. Efficiently Mining Long Pattern from Databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 85–93, 1998.
Bayardo, R.J., Agrawal, R. and Gunopulos, D. Constraint-Based Rule Mining in Large, Dense Databases. In Proc, of the 15th Int’l Conference on Data Engineering, 1999.
Han, J., Pei, J. and Yin, Y. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Conference on Management of Data, Dallas, pages 1–12, 2000.
Coenen, F., Goulbourne, G., and Leng, P. Computing Association Rules using Partial Totals. PKDD 2001, pages 54–66, 2001.
Coenen, F. and Leng, P. Optimising Association Rule Algorithms Using Itemset Ordering. Research and Development in Intelligent Systems XVIII: Proc ES2001 Conference, eds M Bramer, F Coenen and A Preece, Springer, pp53–66.
Goulbourne, G., Coenen, F. and Leng, P. Algorithms for Computing Association Rules Using a Partial-Support Tree. J. Knowledge-Based System 13 (2000), pages 141–149. (also Proc ES’99.)
Toivonen, H. Sampling Large Databases for Association Rules. In Proc. of the 22th VLDB Conference, Mumbai, India, pages 1–12, 1996.
Brin, S., Motwani, R., Ullman, J. D. and Tsur, S. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proc. of the ACM SIGMOD Conference on Management of Data, USA, pages 255–264, 1997.
Savasere, A., Omiecinski, E. and Navathe, S. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proc, of the 21th VLDB Conference, Zurich, Swizerland, pages 432–444, 1995.
Zaki, M.J. Parthasarathy, S. Ogihara, M. and Li, W. New Algorithms for fast discovery of association rules. Technical report 651, University of Rochester, Computer Science Department, New York. July 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Copyright information
© 2004 Springer-Verlag London
About this paper
Cite this paper
Ahmed, S., Coenen, F., Leng, P. (2004). Strategies for Partitioning Data in Association Rule Mining. In: Coenen, F., Preece, A., Macintosh, A. (eds) Research and Development in Intelligent Systems XX. SGAI 2003. Springer, London. https://doi.org/10.1007/978-0-85729-412-8_10
Download citation
DOI: https://doi.org/10.1007/978-0-85729-412-8_10
Publisher Name: Springer, London
Print ISBN: 978-1-85233-780-3
Online ISBN: 978-0-85729-412-8
eBook Packages: Springer Book Archive