Abstract
The NIS-Apriori algorithm, which is extended from the Apriori algorithm, was proposed for rule generation from non-deterministic information systems and implemented in SQL. The realized system handles the concept of certainty, possibility, and three-way decisions. This paper newly focuses on such a characteristic of table data sets that there is usually a fixed decision attribute. Therefore, it is enough for us to handle itemsets with one decision attribute, and we can see that one frequent itemset defines one implication. We make use of these characteristics and reduce the unnecessary itemsets for improving the performance of execution. Some experiments by the implemented software tool in Python clarify the improved performance.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
We are following rough set based rule generation from table data sets [10, 14, 22] and Apriori based rule generation from transaction data sets [1, 2, 9], and we are investigating a new framework of rule generation from table data sets with information incompleteness [17,18,19,20,21].
Table 1 is a standard table. We term such a table as a Deterministic Information System (DIS). In DISs, several rough set based rule generation methods are proposed [3, 5, 10, 14, 16, 22, 23]. Furthermore, missing values ‘?’ [6, 7, 11] (Table 2) and a Non-deterministic Information System (NIS) [12, 13, 15] (Table 3) were also investigated to cope with information incompleteness. In [12], question-answering based on possible world semantics was investigated, and an axiom system was given for query translation to one equivalent normal form [12].
In NIS, some attribute values are given as a set of possible attribute values due to information incompleteness. In Tables 2, \(\{2,3\}\) in x2 implies ‘either 2 or 3 is the actual value, but there is no information to decide it’, and ‘?’ does there is no information. We replace each ‘?’ with all possible attribute values and have Table 3. Thus, we can handle ‘?’ in NIS (some discretization may be necessary for continuous attribute values). Formerly in NISs, question-answering and information retrieval were investigated, and we are coping with rule generation from NISs.
The Apriori algorithm [1] was proposed by Agrawal for handling transaction data sets. We adjust this algorithm to DIS and NIS by using the characteristics of table data sets. The highlight of this paper is the following.
-
(1)
A brief survey of Apriori based rule generation and a rule generator,
-
(2)
Some improvements of the Apriori based algorithm and a rule generator,
-
(3)
Experiment by the improved rule generator in Python.
This paper is organized as follows: Sect. 2 surveys our framework on NISs and the Apriori algorithm [1, 2, 9]. Section 3 connects table data sets to transaction data sets and copes with the manipulation of candidates of rules. Then, more effective manipulation is proposed in DISs and NISs. Section 4 describes a new NIS-Apriori based system in Python and presents the improved results. Section 5 concludes this paper.
2 Preliminary: An Overview of Rule Generation and Examples
This section briefly reviews rule generation from DISs and NISs.
2.1 Rules and Rule Generation from DISs
In Table 1, we consider implications like \([P,3]\Rightarrow [Dec,a]\) from x1 and \([R,2]\wedge [S,1]\Rightarrow [Dec,b]\) from x3. Generally, a rule is defined as an implication satisfying some constraint. The following is one standard definition of rules [1, 2, 9, 14, 22]. We follow this definition and consider the following rule generation from DIS.
(A rule from DIS). A rule is an implication \(\tau \) satisfying \(support(\tau )\ge \alpha \) and \(accuracy(\tau )\ge \beta \) (\(0< \alpha ,~\beta \le 1.0\)) for given threshold values \(\alpha \) and \(\beta \).
(Rule generation from DIS). If we fix \(\alpha \) and \(\beta \) in DIS, the set of all rules is also fixed, but we generally do not know them. Rule generation is to generate all minimal rules (we term a rule with minimal condition part a minimal rule).
The obtained all minimal rules (\(support(\tau )\ge 0.2\), \(accuracy(\tau )\ge 0.9\)) from Table 1. Our system ensures that there is no other rule except them. In the table rule1, the first rule is \(\tau : [P,1]\Rightarrow [Dec,b]\). Even though \(\tau ': [P,1]\wedge [Q,2]\Rightarrow [Dec,b]\) satisfies the constraint of rules, \(\tau '\) is a redundant implication of \(\tau \) and \(\tau '\) is not minimal.
Here, \(support(\tau )\) is an occurrence ratio of an implication \(\tau \) for the total objects and \(accuracy(\tau )\) is a consistency ratio of \(\tau \) for the condition part of \(\tau \). For example, let us consider \(\tau : [R,2]\wedge [S,1]\Rightarrow [Dec,b]\) from x3. Since \(\tau \) occurs one time for five objects, we have \(support(\tau )\) = 1/5. Since \([R,2]\wedge [S,1]\) occurs two times, we have \(accuracy(\tau )\) = 1/2. Fig. 1 shows all minimal rules (redundant rules are not generated) from Table 1.
2.2 Rules and Rule Generation from NISs
From now, we employ the symbols \(\varPhi \) and \(\psi \) for expressing NIS and DIS, respectively. In NIS \(\varPhi \), we replace a set of all possible values with an element of this set, and then we have one DIS. We term such a DIS a derived DIS from NIS, and let \(DD(\varPhi )\) denote a set of all derived DISs from NIS. Table 1 is a derived DIS from Table 3. In NISs like Table 3, we consider the following two types of rules,
-
(1)
A rule which we certainly conclude from NIS (a certain rule),
-
(2)
A rule which we may conclude from NIS (a possible rule).
These two types of rules seem to be natural for rule generation with information incompleteness. Yao recalls three-valued logic in rough sets and proposes three-way decisions [23, 24]. These types of rules concerning missing values were also investigated in [6, 11], and we coped with the following two types of rules based on possible world semantics [18, 20]. The definition in [6, 11] and the following definition are semantically different [18].
(A certain rule from NIS). An implication \(\tau \) is a certain rule, if \(\tau \) is a rule in each of derived DIS from NIS,
(A possible rule from NIS). An implication \(\tau \) is a possible rule, if \(\tau \) is a rule in at least one derived DIS from NIS.
(Rule generation from NIS). If we fix \(\alpha \) and \(\beta \) in NIS, the set of all certain rules and the set of all possible rules are also fixed. Rule generation is to generate all minimal certain rules and all minimal possible rules.
Two types of rules depend on all derived DISs from NIS, and the number of them increases exponentially. For Table 3, the number is 324 (=\(2^2\times 3^4\)), and the number is more than \(10^{100}\) for the Mammographic data set [4]. Thus, the realization of a system to handle two types of rules was seemed to be hard, however, we gave one solution to this problem.
(Proved Property). For each implication \(\tau \), we developed some formulas to calculate the following,
-
(1)
\(minsupp(\tau )=\min _{\psi \in DD(\varPhi )}\{support(\tau ) \text{ in } \psi \}\),
-
(2)
\(minacc(\tau )=\min _{\psi \in DD(\varPhi )}\{accuracy(\tau ) \text{ in } \psi \}\),
-
(3)
\(maxsupp(\tau )=\max _{\psi \in DD(\varPhi )}\{support(\tau ) \text{ in } \psi \}\),
-
(4)
\(maxacc(\tau )=\max _{\psi \in DD(\varPhi )}\{accuracy(\tau ) \text{ in } \psi \}\).
This calculation employs the rough sets based concept and is independent of the number of derived DISs [18, 20, 21]. By using these formulas, we proved a method to examine ‘\(\tau \) is a certain rule or not’ and ‘\(\tau \) is a possible rule or not’. This method is also independent of the number of all derived DISs [18, 20, 21].
The obtained all minimal certain rules (\(support(\tau )\ge 0.2\), \(accuracy(\tau )\ge 0.9\)) from Table 3. There is no rule except them.
The obtained all minimal possible rules (\(support(\tau )\ge 0.2\), \(accuracy(\tau )\ge 0.9\)) from Table 3.There is no rule except them.
We apply this property to the Apriori algorithm for realizing a rule generation system. The Apriori algorithm effectively enumerates itemsets (candidates of rules), and the support and accuracy values of every candidate are calculated by the Proved Property. Figures 2 and 3 show the obtained minimal certain rules and minimal possible rules from Table 3. As for the execution time, we discuss it in Sect. 4.
2.3 A Relation Between Rules in DISs and Rules in NISs
Let \(\psi ^{actual}\) be a derived DIS with actual information from NIS \(\varPhi \) (we cannot decide \(\psi ^{actual}\) from \(\varPhi \), but we suppose there is an actual \(\psi ^{actual}\) for \(\varPhi \)), then we can easily have the next inclusion relation.
Due to information incompleteness, we know lower and upper approximations of a set of rules in \(\psi ^{actual}\). This property follows the concept of rough sets based approximations.
2.4 The Apriori Algorithm for Transaction Data Sets
Let us consider Table 4, which shows four persons’ purchase of items. Such structured data is termed a transaction data set. In this data set, let us focus on a set \(\{ham,beer\}\). Such a set is generally termed an itemset. For this itemset, we consider two implications \(\tau _{1}: ham\Rightarrow beer\) and \(\tau _{2}: beer\Rightarrow ham\). In \(\tau _{1}\), \(support(\tau _{1})\) = 3/4 and \(accuracy(\tau _{1})\) = 3/3. In \(\tau _{2}\), \(support(\tau _{2})\) = 3/4 and \(accuracy(\tau _{2})\) = 3/4. For an itemset \(\{ham,beer,corn\}\), we consider six implications, \(ham\wedge beer\Rightarrow corn\), \(\cdots \), \(beer\Rightarrow corn\wedge ham\). Like this, Agrawal proposed a method to obtain rules from transaction data sets, which is known as the Apriori algorithm [1, 2, 9]. This algorithm makes use of the following.
(Monotonicity of support). For two itemsets P and Q, if P \(\subseteq \) Q, \(support(Q)\le support(P)\) holds.
By using this property, the Apriori algorithm enumerates all itemsets, which satisfy \(support\ge \alpha \). Each of such itemsets is termed a frequent itemset. Let us consider the manipulation of itemsets in Table 4 under \(support\ge 0.5\). Since there are four transactions, each itemset must occur more than two times. Let \(CAN_{i}\) and \(FI_{i}\) (\(i\ge 0\)) denote a set of all candidates of itemsets and a set of all frequent itemsets consisting of \((i+1)\)-items, respectively. We have the following.
Each element in \(CAN_{i}\) (\(i\ge 1\)) is generated by the combination of two itemsets in \(FI_{i-1}\) [1, 2]. Then, every itemset satisfying the support condition becomes the element of \(FI_{i}\). For example, for \(A:\{ham,corn\}\), \(B:\{beer,cheese\}\in FI_{1}\), we add one element of B to A and have \(\{ham,corn,beer\}, \{ham,corn\), \(cheese\}\in CAN_{2}\). We also do the converse and have \(\{beer,cheese,ham\}, \{beer\), \(cheese,corn\}\in CAN_{2}\). Only one itemset \(\{ham,corn,beer\}\) satisfies the support condition and becomes an element of \(FI_{2}\). Like this, \(FI_{1}\), \(FI_{2}\), \(\cdots \), \(FI_{n}\) are obtained at first, then the accuracy value of each implication defined by a frequent itemset is evaluated. In the subsequent sections, we change the above manipulation by using the characteristics of table data sets.
3 Some Improvements of the NIS-Apriori Based Rule Generator
We describe the improvements in our framework based on Sect. 2.
3.1 From Transaction Data Sets to Table Data Sets
We translate Table 1 to Table 5 and identify each descriptor with an item. Then, we can see that Table 5 is a transaction data set. Thus, we can apply the Apriori algorithm to rule generation.
We define the next sets \(IMP_{1}\), \(IMP_{2}\), \(\cdots \), \(IMP_{n}\).
-
\(IMP_{1}=\{[A,val_{A}]\Rightarrow [Dec,val]\}\),
-
\(IMP_{2}=\{[A,val_{A}]\wedge [B,val_{B}]\Rightarrow [Dec,val]\}\),
-
\(IMP_{3}=\{[A,val_{A}]\wedge [B,val_{B}]\wedge [C,val_{C}]\Rightarrow [Dec,val]\}\),
Here, \(IMP_{i}\) means a set of implications which consist of i-condition attributes. A minimal rule is an implication \(\tau \in \cup _{i}IMP_{i}\), and we may examine each \(\tau \in \cup _{i}IMP_{i}\). However, in the subsequent sections, we consider some effective manipulations to generate minimal rules in \(IMP_{1}\), \(IMP_{2}\), \(\cdots \), sequentially.
3.2 The Manipulation I for Frequent Itemsets by the Characteristics of Table Data Sets
Here, we make use of the characteristics of table data sets below.
(TA1). The decision attribute Dec is fixed. So, it is enough to consider each itemset including one descriptor whose attribute is Dec. For example, we do not handle any itemset like \(\{[P,3],[Q,2]\}\) nor \(\{[P,3],[Dec,a],[Dec,b]\}\) in Table 5.
(TA2). An attribute is related to each descriptor. So, we handle itemsets with different attributes. For example, we do not handle any itemset like \(\{[P,3],[P,1]\), \([Q,2],[Dec,b]\}\) in Table 5.
(TA3). To consider implications, we handle \(CAN_{1}\), \(FI_{1}\) (\(\subseteq IMP_{1})\), \(CAN_{2}\), \(FI_{2}\) (\(\subseteq IMP_{2})\), \(\cdots \), which are defined in Sect. 2.4.
The Apriori algorithm adjusted to table data set DIS \(\psi \). We can examine the accuracy value in each while loop (the rectangle area circled by the dotted line in Fig. 4). This examination is not done in the Apriori algorithm for transaction data sets.
Based on the above characteristics, we can consider Fig. 4. In Fig. 4, itemsets satisfying (TA1) and (TA2) are enumerated. Generally, in the Apriori algorithm, the accuracy value is examined after obtaining all \(FI_{i}\), because the decision attribute is not fixed. For each set in \(FI_{i}\), there are plural implications. However, in a table data set, one implication corresponds to a frequent itemset. We employed this property and proposed the Apriori algorithm adjusted to table data sets [20, 21] in Fig. 5. We term this algorithm the DIS-Apriori algorithm. Here, we calculate the accuracy value of every frequent itemset in each while loop (the rectangle area circled by the dotted line in Fig. 4 and lines 5-7 in Fig. 5). We can easily handle certain rules and possible rules in NISs by extending the DIS-Apriori algorithm.
Proposition 1
-
(1)
We replace DIS \(\psi \) with NIS \(\varPhi \), support and accuracy with minsupp and minacc, respectively. Then, this algorithm generates all minimal certain rules.
-
(2)
We replace DIS \(\psi \) with NIS \(\varPhi \), support and accuracy with maxsupp and maxacc, respectively. Then, this algorithm generates all minimal possible rules.
-
(3)
We term the algorithm consisting of (1) and (2) the NIS-Apriori algorithm.
Both DIS-Apriori and NIS-Apriori algorithms are logically sound and complete for rules. They generate rules without excess and deficiency.
Figures 1, 2 and 3 by the rule generator in SQL are based on the algorithm in Fig. 5 and Proposition 1.
3.3 The Manipulation II for Frequent Itemsets by the Characteristics of Table Data Sets
Now, we advance the manipulation I to the manipulation II. We focus on the statement ‘create \(FI_{i}\)’ in lines 2 and 10 in Fig. 5. In every while loop, we examine each \(\tau \in FI_{i}\subseteq CAN_{i}\subseteq IMP_{i}\), so to reduce sets \(CAN_{i}\) and \(FI_{i}\) will influence the performance of execution. In Fig. 5, we at first need to remark the following.
(Rule generation). The purpose of rule generation is to generate each minimal implication \(\tau \in \cup _{i}IMP_{i}\) satisfying \(support(\tau )\ge \alpha \) and \(accuracy(\tau )\ge \beta \). We obtain \(Rule_{1}, Rest_{1}\subseteq IMP_{1}\) in the 1st while loop, \(Rule_{2}, Rest_{2}\subseteq IMP_{2}\) in the 2nd while loop, and \(Rule_{3}, Rest_{3}\) in the 3rd while loop, \(\cdots \).
(Relation between sets in Fig. 5). We clarify the relation and the definition of \(NOrule_{i}\) below.
-
(1)
\(Rule_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha ,~accuracy(\tau )\ge \beta \}\),
-
(2)
\(Rest_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha ,~accuracy(\tau )<\beta \}\),
-
(3)
\(FI_{i}=\{\tau \in IMP_{i}~|~support(\tau )\ge \alpha \}\),
-
(4)
\(NOrule_{i}=\{\tau \in IMP_{i}~|~support(\tau )<\alpha \}\),
-
(5)
\(IMP_{i}=FI_{i}\cup NOrule_{i}=(Rule_{i}\cup Rest_{i})\cup NOrule_{i}\).
(A case of \(\tau \in Rule_{i}\)). If \(\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rule_{i}\), we do not deal with any redundant implication \(\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in IMP_{i+1}\), because \(\tau '\) cannot be a minimal rule.
(A case of \(\tau \in NOrule_{i}\)). If \(\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in NOrule_{i}\), any redundant implication \(\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\) satisfies \(support(\tau ')<\alpha \). So, \(\tau '\in IMP_{i+1}\) cannot be a rule. Thus, we do not deal with any redundant implication \(\tau '\).
(A case of \(\tau \in Rest_{i}\)). In the accuracy value, the monotonicity like support does not hold (an example is in [20]). Thus, if \(\tau : \wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rest_{i}\), \(accuracy(\tau ')\ge \beta \) may hold for a redundant implication \(\tau ': (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in FI_{i+1}\).
Proposition 2
Let us suppose that we had \(Rule_{i}\) and \(Rest_{i}\) \((IMP_{i}\)=\(Rule_{i}\cup Rest_{i}\cup NOrule_{i})\) in the i-th while loop in Fig. 5. Every candidate of a minimal rule in \(IMP_{i+1}\) is a redundant implication of \(\tau \in Rest_{i}\).
(Proof)
For every implication \(\tau \not \in FI_{i}\subseteq IMP_{i}\), its redundant implication \(\tau '\) satisfies \(support(\tau ')\le support(\tau )<\alpha \). Thus, \(\tau '\) cannot be a minimal rule in \(IMP_{i+1}\). Based on the Apriori algorithm, we need to combine two frequent itemsets in \(FI_{i}\)=\(Rule_{i}\cup Rest_{i}\) (an example of this combination is described in Sect. 2.4). However, for the minimality condition of rules, we do not handle any redundant implication of \(\tau \in Rule_{i}\). Thus, we conclude that every candidate of a minimal rule in \(IMP_{i+1}\) is a redundant implication of \(\tau \in Rest_{i}\).
Definition 1
We define a set \(RCAN_{i}~(\subseteq CAN_{i})\), whose element is a candidate of a minimal rule in \(IMP_{i}\) w.r.t. rules \(\cup _{j=1,\cdots ,(i-1)}Rule_{j}\) and a set \(RFI_{i}=\{\tau \in RCAN_{i}~|~support(\tau )\ge \alpha \}\) \((\subseteq FI_{i}\subseteq IMP_{i})\).
In the Apriori algorithm, the concept of redundancy is not introduced, so that some redundant rules may be generated. The sets \(CAN_{i}\) and \(FI_{i}\) in Fig. 4 are generated from \(FI_{i-1}\) (=\(Rule_{i-1}\cup Rest_{i-1}\)). However, we can generate \(RCAN_{i} (\subseteq CAN_{i})\) and \(RFI_{i} (\subseteq FI_{i})\) from \(Rest_{i-1}\). Furthermore, we previously generated itemsets \(\{[A,a],[B,b],[Dec,v1]\},\{[A,a],[B,b],[Dec,v2]\}\in RCAN_{2}\) from \(\{[A,a],[Dec,v1]\}, \{[B,b],[Dec,v2]\}\in Rest_{1}\), and we removed this combination, because there is no object satisfying both [Dec, v1] and [Dec, v2]. This combination formerly generated meaningless itemsets. This revision is another improvement in the manipulation of itemsets.
Proposition 3
The set \(RCAN_{i}\) and \(RFI_{i}\) are given as follows:
New manipulation II of itemsets. We can handle \(RCAN_{i}\subseteq CAN_{i}\) and \(RFI_{i}\subseteq FI_{i}\) for generating minimal rules. In the Apriori algorithm, \(CAN_{i}\) and \(FI_{i}\) are employed, so redundant rules may be generated. By using \(RCAN_{i}\) and \(RFI_{i}\), the candidates of rules are reduced, and the performance of execution is improved.
(Proof)
(In case of i = 1) \(RCAN_{1}\) = \(CAN_{1}\) and \(RFI_{1}\) = \(FI_{1}\) hold, because redundant rules occur after 2nd while loop.
\((In~ case~ of~ i\ge 2)\) We add one descriptor [B, b] to \(\wedge _{j} [A_{j},val_{j}]\Rightarrow [Dec,val]\in Rest_{i-1}\) and have a redundant implication \(\tau : (\wedge _{j} [A_{j},val_{j}])\wedge [B,b]\Rightarrow [Dec,val]\in IMP_{i}\) due to Proposition 2.
-
(1)
In order to handle the same decision, [B, b] must be the condition part of \(\tau ': [B,b]\Rightarrow [Dec,val]\in RFI_{1}\) = \(FI_{1}\). (If \(\tau '\not \in FI_{1}\), \(support(\tau )<\alpha \) holds and \(\tau \) cannot be a rule, because \(\tau \) is a redundant implication of \(\tau ')\).
-
(2)
\(FI_{1}\) = \(Rule_{1}\cup Rest_{1}\) holds. If \(\tau '\in Rule_{1}\), \(\tau \) cannot be a minimal rule, because \(\tau '\) is a minimal rule.
Based on the above discussion, we conclude \(\tau '\in Rest_{1}\).
We propose the manipulation II in Fig. 6 due to the above propositions. In the Apriori algorithm, \(CAN_{i}\) is generated by \(FI_{i-1}\), but we can remove redundant implications of \(\tau \in Rule_{i-1}\). Thus, we can handle \(RCAN_{i}\), which is a subset of \(CAN_{i}\). If the number of elements in \(Rule_{i-1}\) is large, the number of elements in \(RCAN_{i}\) will be much smaller than that of \(CAN_{i}\).
Proposition 4
The DIS-Apriori algorithm with the manipulation II is sound and complete for minimal rules in DIS, and the NIS-Apriori algorithm with the manipulation II is also sound and complete for minimal certain rules and minimal possible rules in NIS. They do not miss any rule defined in DIS \(\psi \) or NIS \(\varPhi \).
(Sketch of Proof). We have proved that the DIS-Apriori and NIS-Apriori algorithms are sound and complete [20, 21]. We newly introduced sets \(RCAN_{i}\subseteq CAN_{i}\) and \(RFI_{i}\subseteq FI_{i}\) by using the redundancy of rules, and we extended the previous two algorithms to those with the manipulation II. The proposed algorithm does not examine each \(\tau \in \cup _{j}IMP_{j}\), but examines each \(\tau \in \cup _{j}RCAN_{j}\). As a result, this algorithm generates the same rules defined by the procedure ‘to examine each \(\tau \in \cup _{j}IMP_{j}\)’.
4 An Improved Apriori Based Rule Generator and Some Experiments
This section compares the NIS-Apriori algorithm and the NIS-Apriori algorithm with the manipulation II. Of course, two algorithms generate the same rules due to Propositions 1 and 4, and the latter algorithm makes use of the redundancy concept. We newly implemented two systems in Python (Windows PC, CPU: Intel i7-4600U, 2.7 z). Table 6 shows the results on the Car Evaluation data set [4], and Table 7 does the results on the Phishing data set [4]. They are the cases of DISs, and the characteristic of \(RCAN_{i}\subseteq CAN_{i}\) is effectively employed.
Now, we show two examples by the NIS-Apriori algorithm. The one is the Congressional Voting data set [4], and the other is the Lithology data set [8]. As we described in Proposition 1, the NIS-Apriori algorithm (certain rule generation) is the DIS-Apriori algorithm with criterion values minsupp and minacc. Thus, the number of candidates of itemsets is also reduced by the manipulation II. The experiments easily examine the advancement of the manipulation II (Tables 8 and 9).
5 Concluding Remarks
We recently adjusted the Apriori algorithm to table data sets and proposed the DIS-Apriori and NIS-Apriori algorithms. This paper makes use of the characteristics of table data sets (one decision attribute Dec is fixed) and improved these algorithms. If we do not handle table data sets, there was no necessity for considering Fig. 6. The framework of the manipulation II (Fig. 6) is an improvement of Apriori based rule generation by using the characteristics of table data sets. We can generate minimal rules by using \(RCAN_{i}\subseteq CAN_{i}\) and \(RFI_{i}\subseteq FI_{i}\). This reduction causes to reduce the candidates of itemsets. We newly implemented the proposed algorithm in Python and examined the improvement of the performance of execution by experiments.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994, pp. 487–499. Morgan Kaufmann (1994)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press (1996)
Ciucci, D., Flaminio, T.: Generalized rough approximations in PI 1/2. Int. J. Approx. Reason. 48(2), 544–558 (2008)
Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2010). http://mlearn.ics.uci.edu/MLRepository.html. Accessed 10 July 2019
Greco, S., Matarazzo, B., Słowiński, R.: Granular computing and data mining for ordered data: the dominance-based rough set approach. In: Meyers, R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 4283–4305. Springer, New York (2009). https://doi.org/10.1007/978-0-387-30440-3
Grzymała-Busse, J.W., Werbrouck, P.: On the best search method in the LEM1 and LEM2 algorithms. In: Orłowska, E. (ed.) Incomplete Information: Rough Set Analysis. Studies in Fuzziness and Soft Computing, vol. 13, pp. 75–91. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-7908-1888-8_4
Grzymala-Busse, J.W.: Data with missing attribute values: generalization of indiscernibility relation and rule induction. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 78–95. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27794-1_3
Hossain, T.M., Watada, J., Hermana, M., Shukri, S.R., Sakai, H.: A rough set based rule induction approach to geoscience data. In: Proceedings of UMSO 2018. IEEE (2018). https://doi.org/10.1109/UMSO.2018.8637237
Jovanoski, V., Lavrač, N.: Classification rule learning with APRIORI-C. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 44–51. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45329-6_8
Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Pal, S.K., Skowron, A. (eds.) Rough Fuzzy Hybridization: A New Method for Decision Making, pp. 3–98. Springer, Heidelberg (1999)
Kryszkiewicz, M.: Rules in incomplete information systems. Inf. Sci. 113(3–4), 271–292 (1999)
Lipski, W.: On databases with incomplete information. J. ACM 28(1), 41–70 (1981)
Orłowska, E., Pawlak, Z.: Representation of nondeterministic information. Theoret. Comput. Sci. 29(1–2), 27–39 (1984)
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Pawlak, Z.: Systemy Informacyjne: Podstawy Teoretyczne. WNT (1983). (in Polish)
Riza, L.S., et al.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package RoughSets. Inf. Sci. 287(10), 68–89 (2014)
Sakai, H., Ishibashi, R., Koba, K., Nakata, M.: Rules and apriori algorithm in non-deterministic information systems. Trans. Rough Sets 9, 328–350 (2008)
Sakai, H., Wu, M., Nakata, M.: Apriori-based rule generation in incomplete information databases and non-deterministic information systems. Fundam. Inf. 130(3), 343–376 (2014)
Sakai, H.: Execution logs by RNIA software tools. http://www.mns.kyutech.ac.jp/~sakai/RNIA. Accessed 10 July 2019
Sakai, H., Nakata, M.: Rough set-based rule generation and Apriori-based rule generation from table data sets: a survey and a combination. CAAI Trans. Intell. Technol. 4(4), 203–213 (2019)
Sakai, H., Nakata, M., Watada, J.: NIS-Apriori-based rule generation with three-way decisions and its application system in SQL. Inf. Sci. 507, 755–771 (2020)
Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support - Handbook of Advances and Applications of the Rough Set Theory, pp. 331–362. Kluwer Academic Publishers, Berlin (1992)
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 314–353 (2010)
Hu, M., Yao, Y.: Structured approximations as a basis for three-way decisions in rough set theory. Knowl.-Based Syst. 165, 92–109 (2019)
Acknowledgment
The authors would be grateful to the anonymous referees for their useful comments. This work is supported by JSPS (Japan Society for the Promotion of Science) KAKENHI Grant Number JP20K11954.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jian, Z., Sakai, H., Ohwa, T., Shen, KY., Nakata, M. (2020). An Adjusted Apriori Algorithm to Itemsets Defined by Tables and an Improved Rule Generator with Three-Way Decisions. In: Bello, R., Miao, D., Falcon, R., Nakata, M., Rosete, A., Ciucci, D. (eds) Rough Sets. IJCRS 2020. Lecture Notes in Computer Science(), vol 12179. Springer, Cham. https://doi.org/10.1007/978-3-030-52705-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-52705-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52704-4
Online ISBN: 978-3-030-52705-1
eBook Packages: Computer ScienceComputer Science (R0)