Elsevier

Knowledge-Based Systems

Volume 45, June 2013, Pages 20-30
Knowledge-Based Systems

A lattice-based approach for mining most generalization association rules

https://doi.org/10.1016/j.knosys.2013.02.003Get rights and content

Abstract

Traditional association rules consist of some redundant information. Some variants based on support and confidence measures such as non-redundant rules and minimal non-redundant rules were thus proposed to reduce the redundant information. In the past, we proposed most generalization association rules (MGARs), which were more compact than (minimal) non-redundant rules in that they considered the condition of equal or higher confidence, instead of only equal confidence. However, the execution time for generating MGARs increased with an increasing number of frequent closed itemsets. Since lattices are an effective data structure widely used in data mining, in this paper, we thus propose a lattice-based approach for fast mining most generalization association rules. Firstly, a new algorithm for building a frequent-closed-itemset lattice is introduced. After that, a theorem on pruning nodes in the lattice for rule generation is derived. Finally, an algorithm for fast mining MGARs from the lattice constructed is developed. The proposed algorithm is tested with several databases and the results show that it is more efficient than mining MGARs directly from frequent closed itemsets.

Introduction

Mining association rules is an important task in data mining and knowledge discovery [1]. They have wide applications, such as basket data analysis [1], semantic web mining [13], text mining [32], and among others. In the past, some methods have been proposed for mining association rules, such as traditional association rules [1], non-redundant association rules [29], [30], and minimal non-redundant association rules [2]. Given a transaction database, mining traditional association rules is to generate all rules that their supports satisfy minimum support threshold (minSup) and their confidences satisfy minimum confidence threshold (minConf). An association rule R: X  Y is called a minimal non-redundant association rule if and only if there does not exist an association rule with the same support value and conference value as R, but with a more specific antecedent part and a more general consequent part. Although the methods for mining association rules are different, their processing is nearly the same. Their mining processes are usually divided into the following two phases:

  • (i)

    Mining frequent itemsets (FIs) or frequent closed itemsets (FCIs).

  • (ii)

    Mining traditional association rules or (minimal) non-redundant association rules from FIs or FCIs.

Traditional association rules generated a lot of redundant. Some approaches have thus been proposed to reduce the number of rules and increase the rule usefulness for users [2], [14], [15], [29], [30]. Although these approaches may generate fewer rules than the traditional approach, the number of rules is still large. For example, for the Chess database with minSup = 70% and minConf = 0%, the number of rules generated by Zaki’s method [30] is 152,074 and that by Bastide et al.’s method [2] is 3,373,625. In fact, in a mined knowledge set, some rules may be inferred from some other rules.

For illustrating this problem, we consider an example database [31] as in Table 1:

If minimal non-redundant rules are mined with minSup = 50% and minConf = 80%, the results are nine rules as {D4,1C, T4,1C, W5,1C, A4,1CW, DW3,1C, AT3,1W, TW3,1A, W4,4/5AC, W4,4/5AC}.

We can see that some of them are redundant according to their supports and confidences. For example, the rule DW3,1C is a weaker rule because there exist some stronger rules such as D4,1C and W5,1C. The rule AT3,1W is also a weaker rule because there is a stronger rule A4,1CW. Similarly, the rule W4,4/5AC will not be kept since its more general rule W5,1C has been generated. The two previous methods (non-redundant association rules and minimal non-redundant association rules) will not prune the above rules.

Based on the consideration above, it is thus necessary to have a method for pruning these weak rules. We thus proposed an approach called mining the most generalization association rules [25] to generate a compact rule set. A most generalization association rule is different from a minimal non-redundant rule in that the former considers the condition of equal or higher confidence, instead of only equal confidence. That is, an association rule R: X  Y is a most generalization rule if and only if there does not exist an association rule with a higher confidence value than R, but with a more specific antecedent part and a more general consequent part. They showed that the number of MGARs was smaller than those of the non-redundant association rules [30] and the minimal non-redundant association rules [2]. We also developed some theorems for fast pruning a lot of rules directly. The remaining rules were then checked with the aid of a hash table. However, the execution time for generating MGARs increased with an increasing number of FCIs. When the number of FCIs was large, the algorithm consumed much time in traversing and checking whether a FCI X is a subset of a FCI Y [25]. The time complexity for checking was analyzed to be O(|FCIs|2).

A useful application of this result is for fast prediction. In prediction, the strongest rule in a rule set is often used to predict an item or a sequence of items. A smaller rule set represents less match time. Therefore, mining a smallest rule set can help reduce the prediction time.

Some lattice-based approaches for quickly mining association rules have recently been proposed [26], [27]. In this paper, we thus adopt a lattice structure and propose an approach based on the lattice for fast mining MGARs from a set of transactions. The proposed approach consists of two parts. The first part builds a frequent-closed-itemset lattice (FCIL) from the FCIs and the second part mines MGARs based on the FCIL constructed. Experimental results also show that the proposed lattice-based approach is more efficient than the previous FCI-based one in most cases.

The rest of this paper is organized as follows: Some studies related to mining association rules and building lattices are reviewed in Section 2. An algorithm for building an FCIL is designed in Section 3. The theorem and the algorithm used for generating the most generalization association rules based on FCIL are proposed in Section 4. Experimental results of the performance of the proposed algorithm and memory usage are shown in Section 5. We conclude our work in Section 6.

Section snippets

Mining frequent closed itemsets

Mining frequent itemsets plays an important role in the association rule mining process. A frequent itemset can be defined as follows. Let D be a transaction database and I be the set of items in D. The support (count) σ(X) of an itemset X (XI) is the number of transactions in D that contain X. An itemset X is called a frequent itemset if the support of X is larger or equal to the minSup, where minSup is a predefined minimum support threshold.

Frequent closed itemsets are a variant of frequent

Concept and algorithm

As mentioned above, Zaki and Hsiao [31] proposed the CHARM-L algorithm for building an FCIL. When an FCIL is traversed to generate rules, the order of itemsets may not be listed according to length. For example, consider the database shown in Table 1. It consists of six transactions and five items.

The FCIL built from the database in Table 1 with minSup = 50% is shown in Fig. 1. If we traverse it using the depth-first search (DFS), the list of FCIs is {C, CD, CDW, CT, ACTW, CW, ACW}. If we

Mining most generalization association rules from FCIL

This section presents an algorithm for generating MGARs from an FCIL. The following theorem is first derived.

Theorem 1

Given three nodes l1, l2, and l3 in FCIL, if l1 is the parent node of l2, l2 is the parent node of l3, and l2.supl1.sup<minConf, then l3.supl1.sup<minConf.

Proof

Since l1 is the parent node of l2 and l2 is the parent node of l3, l1.itemsetl2.itemsetl3.itemset. This implies that l1.supl2.supl3.sup. Thus, l2.supl1.supl3.supl1.sup. Since l2.supl1.sup<minConf, it implies l3.supl1.sup<minConf. 

Experimental results

Experiments were conducted to show the performance of the algorithms. They were implemented on a Centrino Core 2 Duo (2 × 2.53 GHz) PC with 4 GB of RAM and running Windows 7. The algorithms were coded in C# 2008. Seven databases from [6] were used for the experiments; their features are shown in Table 4.

Conclusions and future work

In this paper, we have proposed an effective approach for mining most generalization association rules from transaction databases. It utilizes FCIL to quickly find all pairs {X, Y}, in which both X and Y are FCIs and XY. An algorithm for building FCIL has been proposed as well. Experimental results show that the lattice-based approach is more efficient than the FCI-based one in most cases. Besides, the proposed lattice-based approach is quite suitable for reuse. If association rules under

Acknowledgment

This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.01-2012.47.

References (32)

  • J. Han, M. Kamber, Data Mining: Concept and Techniques, second ed., Morgan Kaufmann Publishers, 2006, pp....
  • http://fimi.cs.helsinki.fi/data/ (download on April...
  • S.O. Kuznetsov et al.

    Comparing performance of algorithms for generating concept lattices

    Journal of Experimental & Theoretical Artificial Intelligence

    (2002)
  • M. Liu et al.

    Reduction method for concept lattices based on rough set theory and its application

    Journal of Computers & Mathematics with Applications

    (2007)
  • B. Lucchese et al.

    Fast and memory efficient mining of frequent closed itemsets

    IEEE Transaction on Knowledge and Data Engineering

    (2006)
  • H.D.K. Moonestinghe, S. Fodeh, P.N. Tan, Frequent closed itemsets mining using prefix graphs with an efficient...
  • Cited by (0)

    View full text