Abstract
A well-known approach to Knowledge Discovery in Databases involves the identification of association rules linking database attributes. Extracting all possible association rules from a database, however, is a computationally intractable problem, because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. Existing methods for dealing with this may involve multiple passes of the database, and tend still to cope badly with densely-packed database records. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach. Finally, we discuss possible further applications of the method.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD-93, pp. 207–216.
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. 20th VLDB Conference, Santiago, pp. 487–499.
Bayardo, R.J. 1998. Efficiently mining long patterns from databases. In Proc. ACM-SIGMOD Int Conf on Management of Data, pp. 85–93.
Bayardo, R.J., Agrawal, R., and Gunopolos, D. 1999. Constraint-based rule mining in large, dense databases. In Proc. 15th Int Conf on Data Engineering.
Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. In Proc. ACM SIGMOD Conference, pp. 255–264.
Fayyad, U., Piatetsky-Shapiro, G., and Smythe, P. 1996. Knowledge discovery and data mining: Towards a unifying framework. In Proceedings of the Second International Conference on Data Mining and Knowledge Discovery, AAAI Press, pp. 82–95.
Goulbourne, G., Coenen, F., and Leng, P. 2000. Algorithms for computing association rules using a partial-support tree. J. Knowledge-Based Systems, 13: 141–149.
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. ACMSIGMOD 2000 Conference, pp. 1–12.
Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Centre, San Jose.
Mannila, H., Toivonen, H., and Verkamo, A.I. 1994. Efficient algorithms for discovering association rules. In Proc. AAAIWorkshop on Knowledge Discovery in Databases, U.M. Fayyad and R. Uthurusamy (Eds.), Seattle, pp. 181–192.
Rymon, R. 1992. Search through systematic set enumeration. In Proc. 3rd Int'l Conf. on Principles of Knowledge Represenation and Reasoning, pp. 539–550.
Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proc. 21st VLDB Conference, Zurich, pp. 432–444.
Toivonen, H. 1996. Sampling large databases for association rules. In Proc. 22nd VLDB Conference, Bombay, pp. 134–145.
Zaki, M.J., Parthasarathy, S., Ogihara, M., and Li,W. 1997. New algorithms for fast discovery of association rules. Technical report 651, University of Rochester, Computer Science Department, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Coenen, F., Goulbourne, G. & Leng, P. Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery 8, 25–51 (2004). https://doi.org/10.1023/B:DAMI.0000005257.93780.3b
Issue Date:
DOI: https://doi.org/10.1023/B:DAMI.0000005257.93780.3b