A lattice-based approach for mining most generalization association rules
Introduction
Mining association rules is an important task in data mining and knowledge discovery [1]. They have wide applications, such as basket data analysis [1], semantic web mining [13], text mining [32], and among others. In the past, some methods have been proposed for mining association rules, such as traditional association rules [1], non-redundant association rules [29], [30], and minimal non-redundant association rules [2]. Given a transaction database, mining traditional association rules is to generate all rules that their supports satisfy minimum support threshold (minSup) and their confidences satisfy minimum confidence threshold (minConf). An association rule R: X → Y is called a minimal non-redundant association rule if and only if there does not exist an association rule with the same support value and conference value as R, but with a more specific antecedent part and a more general consequent part. Although the methods for mining association rules are different, their processing is nearly the same. Their mining processes are usually divided into the following two phases:
- (i)
Mining frequent itemsets (FIs) or frequent closed itemsets (FCIs).
- (ii)
Mining traditional association rules or (minimal) non-redundant association rules from FIs or FCIs.
Traditional association rules generated a lot of redundant. Some approaches have thus been proposed to reduce the number of rules and increase the rule usefulness for users [2], [14], [15], [29], [30]. Although these approaches may generate fewer rules than the traditional approach, the number of rules is still large. For example, for the Chess database with minSup = 70% and minConf = 0%, the number of rules generated by Zaki’s method [30] is 152,074 and that by Bastide et al.’s method [2] is 3,373,625. In fact, in a mined knowledge set, some rules may be inferred from some other rules.
For illustrating this problem, we consider an example database [31] as in Table 1:
If minimal non-redundant rules are mined with minSup = 50% and minConf = 80%, the results are nine rules as {, , , , , , , , }.
We can see that some of them are redundant according to their supports and confidences. For example, the rule is a weaker rule because there exist some stronger rules such as and . The rule is also a weaker rule because there is a stronger rule . Similarly, the rule will not be kept since its more general rule has been generated. The two previous methods (non-redundant association rules and minimal non-redundant association rules) will not prune the above rules.
Based on the consideration above, it is thus necessary to have a method for pruning these weak rules. We thus proposed an approach called mining the most generalization association rules [25] to generate a compact rule set. A most generalization association rule is different from a minimal non-redundant rule in that the former considers the condition of equal or higher confidence, instead of only equal confidence. That is, an association rule R: X → Y is a most generalization rule if and only if there does not exist an association rule with a higher confidence value than R, but with a more specific antecedent part and a more general consequent part. They showed that the number of MGARs was smaller than those of the non-redundant association rules [30] and the minimal non-redundant association rules [2]. We also developed some theorems for fast pruning a lot of rules directly. The remaining rules were then checked with the aid of a hash table. However, the execution time for generating MGARs increased with an increasing number of FCIs. When the number of FCIs was large, the algorithm consumed much time in traversing and checking whether a FCI X is a subset of a FCI Y [25]. The time complexity for checking was analyzed to be O(|FCIs|2).
A useful application of this result is for fast prediction. In prediction, the strongest rule in a rule set is often used to predict an item or a sequence of items. A smaller rule set represents less match time. Therefore, mining a smallest rule set can help reduce the prediction time.
Some lattice-based approaches for quickly mining association rules have recently been proposed [26], [27]. In this paper, we thus adopt a lattice structure and propose an approach based on the lattice for fast mining MGARs from a set of transactions. The proposed approach consists of two parts. The first part builds a frequent-closed-itemset lattice (FCIL) from the FCIs and the second part mines MGARs based on the FCIL constructed. Experimental results also show that the proposed lattice-based approach is more efficient than the previous FCI-based one in most cases.
The rest of this paper is organized as follows: Some studies related to mining association rules and building lattices are reviewed in Section 2. An algorithm for building an FCIL is designed in Section 3. The theorem and the algorithm used for generating the most generalization association rules based on FCIL are proposed in Section 4. Experimental results of the performance of the proposed algorithm and memory usage are shown in Section 5. We conclude our work in Section 6.
Section snippets
Mining frequent closed itemsets
Mining frequent itemsets plays an important role in the association rule mining process. A frequent itemset can be defined as follows. Let D be a transaction database and I be the set of items in D. The support (count) σ(X) of an itemset X () is the number of transactions in D that contain X. An itemset X is called a frequent itemset if the support of X is larger or equal to the minSup, where minSup is a predefined minimum support threshold.
Frequent closed itemsets are a variant of frequent
Concept and algorithm
As mentioned above, Zaki and Hsiao [31] proposed the CHARM-L algorithm for building an FCIL. When an FCIL is traversed to generate rules, the order of itemsets may not be listed according to length. For example, consider the database shown in Table 1. It consists of six transactions and five items.
The FCIL built from the database in Table 1 with minSup = 50% is shown in Fig. 1. If we traverse it using the depth-first search (DFS), the list of FCIs is {C, CD, CDW, CT, ACTW, CW, ACW}. If we
Mining most generalization association rules from FCIL
This section presents an algorithm for generating MGARs from an FCIL. The following theorem is first derived. Theorem 1 Given three nodes l1, l2, and l3 in FCIL, if l1 is the parent node of l2, l2 is the parent node of l3, and , then . Proof Since l1 is the parent node of l2 and l2 is the parent node of l3, . This implies that . Thus, . Since , it implies . □
Experimental results
Experiments were conducted to show the performance of the algorithms. They were implemented on a Centrino Core 2 Duo (2 × 2.53 GHz) PC with 4 GB of RAM and running Windows 7. The algorithms were coded in C# 2008. Seven databases from [6] were used for the experiments; their features are shown in Table 4.
Conclusions and future work
In this paper, we have proposed an effective approach for mining most generalization association rules from transaction databases. It utilizes FCIL to quickly find all pairs {X, Y}, in which both X and Y are FCIs and . An algorithm for building FCIL has been proposed as well. Experimental results show that the lattice-based approach is more efficient than the FCI-based one in most cases. Besides, the proposed lattice-based approach is quite suitable for reuse. If association rules under
Acknowledgment
This work was supported by Vietnam’s National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.01-2012.47.
References (32)
- et al.
An incremental algorithm to construct a lattice of set intersections
Science of Computer Programming
(2009) - et al.
An efficient algorithm for mining closed inter-transaction itemsets
Data & Knowledge Engineering
(2008) - et al.
Finding association rules in semantic web data
Knowledge-Based Systems
(2012) - et al.
Efficient mining of association rules using closed itemset lattices
Information Systems
(1999) - et al.
Interestingness for association rules: combination between lattice and hash tables
Expert Systems with Applications
(2011) - et al.
Text clustering using frequent itemsets
Knowledge-Based Systems
(2010) - R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of Very Large Databases ‘94∗...
- Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, L. Lakhal, Mining minimal non-redundant association rules using frequent...
- et al.
Fast algorithms for frequent itemset mining using FP-trees
IEEE Transactions on Knowledge and Data Engineering
(2005) - B. Ganter, R. Wille, Formal Concept Analysis, Springer-Verlag,...