Elsevier

Expert Systems with Applications

Volume 38, Issue 9, September 2011, Pages 11630-11640
Expert Systems with Applications

Interestingness measures for association rules: Combination between lattice and hash tables

https://doi.org/10.1016/j.eswa.2011.03.042Get rights and content

Abstract

There are many methods which have been developed for improving the time of mining frequent itemsets. However, the time for generating association rules were not put in deep research. In reality, if a database contains many frequent itemsets (from thousands up to millions), the time for generating association rules is more longer than the time for mining frequent itemsets. In this paper, we present a combination between lattice and hash tables for mining association rules with different interestingness measures. Our method includes two phases: (1) building frequent itemsets lattice and (2) generating interestingness association rules by combining between lattice and hash tables. To compute the measure value of a rule fast, we use the lattice to get the support of the left hand side and use hash tables to get the support of the right hand side. Experimental results show that the mining time of our method is more effective than the method that of directly mining from frequent itemsets uses hash tables only.

Highlights

► Propose a combination between frequent itemsets lattice and hash table for mining association rules with interestingness measures. ► Include two phases: (1) building frequent itemsets lattice, and (2) mining interestingness association rules. ► Lattice is used to get support of itemset in the left hand side of a rule and hash tables are used to get support of itemset in the right hand side.

Introduction

Since the mining association rules problem presented in 1993 (Agrawal, Imielinski, & Swami, 1993), there have been many algorithms developed for improving the effect of mining association rules such as Apriori (Agrawal & Srikant, 1994), FP-tree (Grahne and Zhu, 2005, Han and Kamber, 2006, Wang et al., 2003), and IT-tree (Zaki & Hsiao, 2005). Although the approaches for mining association rules are different, their processing ways are nearly the same. Their mining processes are usually divided into the following two phases:

  • (i)

    Mining frequent itemsets;

  • (ii)

    Generating association rules from them.

Recent years, some researchers have studied about interestingness measures for mining interestingness association rules (Aljandal et al., 2008, Athreya and Lahiri, 2006, Bayardo and Agrawal, 1999, Brin et al., 1997, Freitas, 1999, Holena, 2009, Hilderman and Hamilton, 2001, Huebner, 2009; Huynh et al., 2007, chap. 2; Lee et al., 2003, Lenca et al., 2008, MCGarry, 2005, Omiecinski, 2003, Piatetsky-Shapiro, 1991, Shekar and Natarajan, 2004, Steinbach et al., 2007, Tan et al., 2002, Waleed, 2009, Yafi et al., 2007, Yao et al., 2006). A lot of measures have been proposed such as support, confidence, cosine, lift, chi-square, gini-index, Laplace, phi-coefficient (about 35 measures Huynh et al., 2007). Although they differ from the equations, they use four elements to compute the measure value of rule X  Y: (i) n; (ii) nX; (iii) nY; and (iv) nXY, where n is the number of transactions, nX is the number of transactions containing X, nY is the number of transactions containing Y, nXY is the number of transactions containing both X and Y. Some other elements for computing the measure value are determined via n, nX, nY, nXY as follows: nX¯=n-nX,nY¯=n-nY,nXY¯=nX-nXY,nXY¯=nY-nXY, and nXY¯=n-nXY.

We have nX = support (X), nY = support (Y), and nXY = support (XY). Therefore, if support (X), support (Y), and support (XY) are determined then value of all measures of a rule will be determined.

We can see that almost previous studies were done in small databases. However, databases are often very large in practice. For example, Huynh et al. only mined in the databases which numbers of rules are small (contain about one hundred thousand rules, Huynh et al., 2007). In fact, there are a lot of databases containing about millions of transactions and thousands items containing millions of rules, the time for generating association rules and computing their measure values is very long. Therefore, this paper proposes a method for computing the interestingness measure values of association rules fast. We use lattice to determine itemsets X, XY and their supports. To determine the support of Y, we use hash tables.

The rest of this paper is as follows: Section 2 presents related works of interestingness measures. Section 3 discusses interestingness measures for mining association rules. Section 4 presents the lattice and hash tables, an algorithm for fast building the lattice is also discussed in this section. Section 5 presents an algorithm for generating association rules with their measure values using the lattice and hash tables. Section 6 presents experimental results, and we conclude our work in section 7.

Section snippets

Related work

There are many studies in interestingness measures. In 1991, Piatetsky–Shapiro proposed the statistical independence of rules which is the interestingness measure (Piatetsky-Shapiro, 1991). After that, many measures were proposed. In 1994, Agrawal and Srikant proposed the support and the confidence measures for mining association rules (Agrawal & Srikant, 1994). Apriori algorithm for mining rules was discussed. Lift and χ2 as correlation measures were proposed (Brin et al., 1997). Hilderman and

Association rules mining

Association rule is an expression form Xq,vmY(XY=), where q = support (XY) and vm is a measure value. For example, in traditional association rules, vm is confidence of the rule and vm = support (XY)/support (X).

To fast mine traditional association rules (mining rule with the confidence measure), we can use hash tables (Han & Kamber, 2006). Vo and Le presented a new method for mining association rules using FIL (Vo & Le, 2009). The process includes two phases: (i) Building FIL; (ii) Generating

Building FIL

Vo and Le presented an algorithm for fast building FIL, we present it here to make reader easier to read next sections (Vo & Le, 2009).

At first, the algorithm initializes the equivalence class [∅] which contains all frequent 1-itemsets. Next, it calls ENUMERATE_LATTICE([P]) function to create a new frequent itemset by combining two frequent itemsets of equivalence class [P], and produces a lattice node {I} (if I is frequent). The algorithm will add a new node {I} into a set of child nodes of

Mining association rules with interestingness measures

This section presents an algorithm for mining association rules with a given interestingness measure. First of all, we traverse the lattice to determine X, XY and their supports. With Y, we compute k=yYy (y is a prime number or an integer number). Based on its length and its key, we can get the support.

Experimental results

All experiments described below have been performed on a centrino core 2 duo (2 × 2.53 GHz) with 4 GBs RAM, running Windows 7, and algorithms were coded in C# (2008). The experimental databases were downloaded from http://fimi.cs.helsinki.fi/data/ to use for experiments, their features are shown in Table 7.

We test the proposed algorithm in many databases. Mushroom and Chess have few items and transactions in that Chess is dense database (more items with high frequent). The number of items in

Conclusion and future work

In this paper, we proposed a new method for mining association rules with interestingness measures. This method uses lattice and hash tables to compute the interestingness measure values fast. Experimental results show that the proposed method is very efficient when compares with only using hash tables. With itemset X and itemset XY, we get their supports by traversing the lattice and mark all traversed nodes. With itemset Y, we use hash tables to get its support. When we only compare the time

References (28)

  • A.A. Freitas

    On rule interestingness measures

    Knowledge-based Systems

    (1999)
  • M. Holena

    Measures of ruleset quality for general rules extraction methods

    International Journal of Approximate Reasoning (Elsevier)

    (2009)
  • P. Lenca et al.

    On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid

    European Journal of Operational Research

    (2008)
  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In VLDB’94 (pp....
  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In...
  • Aljandal, W., Hsu, W. H., Bahirwani, V., Caragea, D., & Weninger, T. (2008). Validation-based normalization and...
  • K.B. Athreya et al.

    Measure theory and probability theory

    (2006)
  • Bayardo, R. J., Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the fifth ACM SIGKDD (pp....
  • Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market...
  • G. Grahne et al.

    Fast algorithms for frequent itemset mining using FP-trees

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • J. Han et al.

    Data mining: Concept and techniques

    (2006)
  • R. Hilderman et al.

    Knowledge discovery and measures of interest

    (2001)
  • Huebner, R. A. (2009). Diversity-based interestingness measures for association rule mining. In Proceedings of ASBBS...
  • H.X. Huynh et al.

    A graph-based clustering approach to evaluate interestingness measures: A tool and a comparative study

    (2007)
  • Cited by (0)

    This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED), project ID: 102.01-2010.02.

    View full text