Interestingness measures for association rules: Combination between lattice and hash tables☆
Highlights
► Propose a combination between frequent itemsets lattice and hash table for mining association rules with interestingness measures. ► Include two phases: (1) building frequent itemsets lattice, and (2) mining interestingness association rules. ► Lattice is used to get support of itemset in the left hand side of a rule and hash tables are used to get support of itemset in the right hand side.
Introduction
Since the mining association rules problem presented in 1993 (Agrawal, Imielinski, & Swami, 1993), there have been many algorithms developed for improving the effect of mining association rules such as Apriori (Agrawal & Srikant, 1994), FP-tree (Grahne and Zhu, 2005, Han and Kamber, 2006, Wang et al., 2003), and IT-tree (Zaki & Hsiao, 2005). Although the approaches for mining association rules are different, their processing ways are nearly the same. Their mining processes are usually divided into the following two phases:
- (i)
Mining frequent itemsets;
- (ii)
Generating association rules from them.
Recent years, some researchers have studied about interestingness measures for mining interestingness association rules (Aljandal et al., 2008, Athreya and Lahiri, 2006, Bayardo and Agrawal, 1999, Brin et al., 1997, Freitas, 1999, Holena, 2009, Hilderman and Hamilton, 2001, Huebner, 2009; Huynh et al., 2007, chap. 2; Lee et al., 2003, Lenca et al., 2008, MCGarry, 2005, Omiecinski, 2003, Piatetsky-Shapiro, 1991, Shekar and Natarajan, 2004, Steinbach et al., 2007, Tan et al., 2002, Waleed, 2009, Yafi et al., 2007, Yao et al., 2006). A lot of measures have been proposed such as support, confidence, cosine, lift, chi-square, gini-index, Laplace, phi-coefficient (about 35 measures Huynh et al., 2007). Although they differ from the equations, they use four elements to compute the measure value of rule X → Y: (i) n; (ii) nX; (iii) nY; and (iv) nXY, where n is the number of transactions, nX is the number of transactions containing X, nY is the number of transactions containing Y, nXY is the number of transactions containing both X and Y. Some other elements for computing the measure value are determined via n, nX, nY, nXY as follows: , and .
We have nX = support (X), nY = support (Y), and nXY = support (XY). Therefore, if support (X), support (Y), and support (XY) are determined then value of all measures of a rule will be determined.
We can see that almost previous studies were done in small databases. However, databases are often very large in practice. For example, Huynh et al. only mined in the databases which numbers of rules are small (contain about one hundred thousand rules, Huynh et al., 2007). In fact, there are a lot of databases containing about millions of transactions and thousands items containing millions of rules, the time for generating association rules and computing their measure values is very long. Therefore, this paper proposes a method for computing the interestingness measure values of association rules fast. We use lattice to determine itemsets X, XY and their supports. To determine the support of Y, we use hash tables.
The rest of this paper is as follows: Section 2 presents related works of interestingness measures. Section 3 discusses interestingness measures for mining association rules. Section 4 presents the lattice and hash tables, an algorithm for fast building the lattice is also discussed in this section. Section 5 presents an algorithm for generating association rules with their measure values using the lattice and hash tables. Section 6 presents experimental results, and we conclude our work in section 7.
Section snippets
Related work
There are many studies in interestingness measures. In 1991, Piatetsky–Shapiro proposed the statistical independence of rules which is the interestingness measure (Piatetsky-Shapiro, 1991). After that, many measures were proposed. In 1994, Agrawal and Srikant proposed the support and the confidence measures for mining association rules (Agrawal & Srikant, 1994). Apriori algorithm for mining rules was discussed. Lift and χ2 as correlation measures were proposed (Brin et al., 1997). Hilderman and
Association rules mining
Association rule is an expression form , where q = support (XY) and vm is a measure value. For example, in traditional association rules, vm is confidence of the rule and vm = support (XY)/support (X).
To fast mine traditional association rules (mining rule with the confidence measure), we can use hash tables (Han & Kamber, 2006). Vo and Le presented a new method for mining association rules using FIL (Vo & Le, 2009). The process includes two phases: (i) Building FIL; (ii) Generating
Building FIL
Vo and Le presented an algorithm for fast building FIL, we present it here to make reader easier to read next sections (Vo & Le, 2009).
At first, the algorithm initializes the equivalence class [∅] which contains all frequent 1-itemsets. Next, it calls ENUMERATE_LATTICE([P]) function to create a new frequent itemset by combining two frequent itemsets of equivalence class [P], and produces a lattice node {I} (if I is frequent). The algorithm will add a new node {I} into a set of child nodes of
Mining association rules with interestingness measures
This section presents an algorithm for mining association rules with a given interestingness measure. First of all, we traverse the lattice to determine X, XY and their supports. With Y, we compute (y is a prime number or an integer number). Based on its length and its key, we can get the support.
Experimental results
All experiments described below have been performed on a centrino core 2 duo (2 × 2.53 GHz) with 4 GBs RAM, running Windows 7, and algorithms were coded in C# (2008). The experimental databases were downloaded from http://fimi.cs.helsinki.fi/data/ to use for experiments, their features are shown in Table 7.
We test the proposed algorithm in many databases. Mushroom and Chess have few items and transactions in that Chess is dense database (more items with high frequent). The number of items in
Conclusion and future work
In this paper, we proposed a new method for mining association rules with interestingness measures. This method uses lattice and hash tables to compute the interestingness measure values fast. Experimental results show that the proposed method is very efficient when compares with only using hash tables. With itemset X and itemset XY, we get their supports by traversing the lattice and mark all traversed nodes. With itemset Y, we use hash tables to get its support. When we only compare the time
References (28)
On rule interestingness measures
Knowledge-based Systems
(1999)Measures of ruleset quality for general rules extraction methods
International Journal of Approximate Reasoning (Elsevier)
(2009)- et al.
On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid
European Journal of Operational Research
(2008) - Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In VLDB’94 (pp....
- Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In...
- Aljandal, W., Hsu, W. H., Bahirwani, V., Caragea, D., & Weninger, T. (2008). Validation-based normalization and...
- et al.
Measure theory and probability theory
(2006) - Bayardo, R. J., Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the fifth ACM SIGKDD (pp....
- Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market...
- et al.
Fast algorithms for frequent itemset mining using FP-trees
IEEE Transactions on Knowledge and Data Engineering
(2005)
Data mining: Concept and techniques
Knowledge discovery and measures of interest
A graph-based clustering approach to evaluate interestingness measures: A tool and a comparative study
Cited by (0)
- ☆
This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED), project ID: 102.01-2010.02.