Decision rule mining using classification consistency rate

doi:10.1016/j.knosys.2013.01.010

Knowledge-Based Systems

Volume 43, May 2013, Pages 95-102

https://doi.org/10.1016/j.knosys.2013.01.010 Get rights and content

Abstract

Decision rule mining is an important technique in many applications. In this paper, we propose a new rough set approach for rule induction based on a significance measure, called classification consistency rate. The approach implements the rule induction from the viewpoint of attribute rather than descriptor. The proposed algorithm is tested and compared with LEM2 algorithm on several real-life data sets added with different levels of inconsistent data. The results show that the proposed algorithm is effective in rule induction for inconsistent data.

Introduction

Rule induction is one of the most important techniques of machine learning, expert system, knowledge discovery and data mining. To handle this issue, many inductive learning methods, such as induction of decision trees [1], [2], rule induction methods [3], [4], [5], [6], [7] and rough set theory [8], [9], [10], [11], [12] are introduced and applied to extract knowledge from databases.

Rough set theory, introduced by Pawlak, is a useful mathematic approach for dealing with vague and uncertain information. It has attracted the attention of many researchers who have studied its theories and its applications during the last decades [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33].

A number of approaches of rule induction based on rough set theory have been proposed. These approaches can be broadly divided into two categories. One is exhaustive category. Skowron [34], [35] proposed famous rule induction approaches based on discernibility matrix and boolean reasoning. For particular clinic application, Tsumoto [27], [28] introduced a rule induction approach called PRIMEROSE, which extracts not only classification rules but also other medical knowledge needed for diagnosis. The induction algorithm consists of two procedures, one is an exhaustive search procedure to induce the exclusive rule through all the attribute–value pairs, and the other is a postprocessing procedure to induce inclusive rules through the combination of all the attribute–value pairs. The other category is heuristic. Grzymala-Busse proposed famous LEM2 algorithm in LERS [9], [36], [37], [38], [39] which is a representative approach for rule induction by rough set theory. LEM2 explores the search space of the whole attribute–value pairs (called descriptors), and a pair who covers the most number of objects is selected to induce decision rule each time. In other words, it implements the rule induction procedure from the viewpoint of descriptors.

However, in some cases, rule induction from the viewpoint of attributes is more reasonable. For example, users may be interested in the rules about some appointed attribute or attribute sets. Based on this observation, we suggest the viewpoint of attribute space rather than viewpoint of descriptor space for the issue of rule induction in this paper. This viewpoint supplies us an approach to study the data at the understandable semantic level of attributes. From the viewpoint of attribute space, we propose a new strategy for decision rules induction from decision systems based on the concept of classification consistency rate which is defined in this paper to search the attribute space. The proposed rule mining strategy is denoted as DRICA (Decision Rule mIning using Consistency rAte).

The remainder of the paper is organized as follows. Some preliminaries about rough set theory are reviewed in Section 2. In Section 3, the algorithm of rule induction based on classification consistency is introduced. An example is used to illustrate the process of generating rules in Section 4. Experiments on several data sets are conducted to test the proposed approach, and the results are compared in Section 5. Section 6 concludes the paper.

Section snippets

Rough set theory

In this section, we first review some basic notions of rough set theory, which can also be referred to [8].

Classification consistency rate

Given a decision table DT = (U, C ∪ D, V, f), let us consider the formula $‖ {POS}_{P} D ‖$

It represents the number of objects can be classified by attribute set P ⊆ C.

Definition 2

Given a decision table DT = (U, C ∪ D, V, f), classification consistency rate relative to attribute set P ⊆ C can be defined as: ${CCR}_{P, D} = \frac{‖ {POS}_{P} D ‖}{‖ U ‖}$

Classification consistency rate CCR_P,D describes the classification ability of attribute set P ⊆ C relative to decision attribute D.

Example 1

Table 1 contains four objects U = {x₁, x₂, x₃, x₄}, three condition attributes C = {a, b, c},

An illustrative example

In this section, an example is given to show how the proposed algorithm can be used to generate rules from decision tables. The same example will be also computed by LEM2 step by step to show the difference between the two approaches. Assume the data set is shown in Table 2.

Table 2 is a decision system with U = {1, 2, … , 9}, C = {a, b, c}, D = {d}. Since object 6 and 8 have the same condition attribute values but different decision attribute value, this decision system is inconsistent. The inconsistent

Experiment study

In last section, we describe the procedure of proposed algorithm to induce rules from inconsistent data set according to an example. In this section, empirical experiments are conducted to test the proposed algorithm.

Conclusion

In this paper, we propose a rough set approach for rule induction based on classification consistency rate called DRICA. DRICA implements the decision rule induction procedure from the viewpoint of attribute rather than descriptors. In each iteration step, it searches in the attribute space instead of descriptor space. It may obtain more than one rules at a single step. Since DRICA uses classification consistency rate as the criterion of selecting attributes, the rules with confidence of 1 rank

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61070074 and 60703038).

References (41)

J. Cendrowska
PRISM: an algorithm for inducing modular rules
International Journal of Man-Machine Studies
(1987)
R.S. Michalski
A theory and methodology of inductive learning
Artificial Intelligence
(1983)
R.S. Michalski
A theory and methodology of inductive learning
Y.-S. Chen et al.
A soft-computing based rough sets classifier for classifying IPO returns in the financial markets
Applied Soft Computing
(2012)
J. Dai
Rough 3-valued algebras
Information Sciences
(2008)
J. Dai et al.
Uncertainty measurement for interval-valued decision systems based on extended conditional entropy
Knowledge-Based Systems
(2012)
J. Dai et al.
Approximations and uncertainty measures in incomplete information systems
Information Sciences
(2012)
J. Dai et al.
Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification
Applied Soft Computing
(2013)
J. Dai et al.
Attribute selection based on a new conditional entropy for incomplete decision systems
Knowledge-Based Systems
(2013)
J. Derrac et al.
Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection
Information Sciences
(2012)

K. Kaneiwa

A rough set approach to multiple dataset analysis

Applied Soft Computing

(2011)

H.S. Own et al.

A new weighted rough set framework based classification for egyptian neonatal jaundice

Applied Soft Computing

(2012)

M.L. Othman et al.

Rough-set-based timing characteristic analyses of distance protective relay

Applied Soft Computing

(2012)

S. Tsumoto

Extraction of experts decision rules from clinical databases using rough set model

Journal of Intelligent Data Analysis

(1998)

W. Xu et al.

Approaches to attribute reductions based on rough set and matrix computation in inconsistent ordered information systems

Knowledge-Based Systems

(2012)

H.-L. Yang et al.

Transformation of bipolar fuzzy rough set models

Knowledge-Based Systems

(2012)

W. Wei et al.

A comparative study of rough sets for hybrid data

Information Sciences

(2012)

X. Zhang et al.

A general frame for intuitionistic fuzzy rough sets

Information Sciences

(2012)

W. Zhu et al.

Reduction and axiomization of covering generalized rough sets

Information Sciences

(2003)

L. Breiman et al.

Classification and Regression Trees

(1984)

Cited by (40)

Information-theoretic measures of uncertainty for interval-set decision tables
2021, Information Sciences
Uncertainty measurement is considered as a vital quantitative way for analyzing and mining potential characteristic features in different types of decision tables. However, considering the equivalent relation is not suitable for evaluating the relationships of objects, few studies focused on the interval-set decision tables. In this paper, we address the uncertainty measurement problem in interval-set decision tables. Firstly, a similarity relation is induced by the similarity degree. Based on the similarity relation, a notion of granular structure is defined and the corresponding properties are investigated in interval-set decision tables. Secondly, we extend the accuracy and the roughness, called the interval approximation accuracy and the interval approximation roughness, to measure the uncertainty under the granular structures. By the analysis of the two extended measures, they can effectively evaluate the uncertainty caused by the approximations in the rough set model. Considering that the size of similarity classes can also affect the uncertainty, an alternative uncertainty measure based on the conditional information entropy, called the interval-decision entropy, is proposed. Moreover, a definition of reduct based on our proposed measure is provided and a heuristic attribute reduction algorithm is designed. Finally, numerical experiments demonstrate that the proposed uncertainty measures are effective and suitable for interval-set decision tables.
Uncertainty measures for interval set information tables based on interval δ-similarity relation
2019, Information Sciences
The notion of uncertainty measure is one of the most important topics in rough set theory and has been studied in different kinds of information tables. However, few studies have focused on the interval set information table, which is regarded as one of the generalized models of single-valued information tables. This paper aims at studying the uncertainty measurements for interval set information tables. Firstly, an interval δ-similarity relation is induced based on the similarity degree. The similarity relation induces the granules, which form a covering in interval set information tables. Secondly, four types of granularity measures are defined to measure the granularity of a covering. Thirdly, the concepts of accuracy and roughness in rough set theory are respectively extended to δ-accuracy and δ-roughness for interval set information tables. Furthermore, four new combinations ofuncertainty measures by considering proposed granularity measures and δ-accuracy and δ-roughness are defined and analyzed. Theoretical analyses and experimental results illustrate that the proposed measures are effective and accurate for interval set information tables.
Uncertainty learning of rough set-based prediction under a holistic framework
2018, Information Sciences
Uncertainty learning is an important research direction of rough set theory, wherein the most popular one is rough set-based prediction, whose goal is to extract decision rules from decision systems and then assign the corresponding decision labels for new samples in terms of the decision rules. To design efficient prediction algorithms, it is necessary and meaningful to measure the uncertainty of rough set-based prediction, especially the stability and generalization performance. In this paper, we analyze the generalization performance of rough set-based prediction algorithms in terms of algorithmic stability analysis and give the generalization error bounds. Firstly, we propose a general rough set-based prediction algorithm to predict the labels for new samples, and then define a scoring function and the corresponding loss function. Secondly, we define two kinds of algorithmic stability for this prediction algorithm in terms of their loss functions, by which two general generalization error bounds are obtained according to two different kinds of stability: strong stability and pointwise hypothesis stability. The bounds numerically imply the performance of the proposed rough set-based prediction algorithm is related to the number of samples and stability parameter. Thirdly, we adopt the confidence and max confidence, min support algorithms as the specific scoring functions instead of general scoring functions. The results show the prediction performance of the confidence algorithm is related to the number of samples and stability parameter, as well as that of max confidence, min support algorithm is associated with the number of samples and minimum support threshold. Based on these discussions, a general framework of stability and generalization error bounds analysis for rough set-based prediction is established. Finally, several experiments are performed to test the previous conclusions.
Uncertainty measurement for incomplete interval-valued information systems based on α-weak similarity
2017, Knowledge-Based Systems
Rough set theory is a powerful mathematical tool to deal with uncertainty in data analysis. Interval-valued information systems are generalized models of single-valued information systems. Recently, uncertainty measures for complete interval-valued information systems or complete interval-valued decision systems have been developed. However, there are few studies on uncertainty measurements for incomplete interval-valued information systems. This paper aims to investigate the uncertainty measures in incomplete interval-valued information systems based on an α-weak similarity. Firstly, the maximum and the minimum similarity degrees are defined when interval-values information systems are incomplete based on the similarity relation. The concept of α-weak similarity relation is also defined. Secondly, the rough set model is constructed. Based on this model, accuracy, roughness and approximation accuracy are given to evaluate the uncertainty in incomplete interval-valued information systems. Furthermore, experimental analysis shows the effectiveness of the constructed uncertainty measures for incomplete interval-valued information systems.
Catoptrical rough set model on two universes using granule-based definition and its variable precision extensions
2017, Information Sciences
Citation Excerpt :
Rough set theory, proposed by Pawlak [34,35] has been conceived as an excellent tool to analyze and handle intelligent systems characterized by imprecise, vague and uncertain information in many fields, such as data mining, knowledge discovery, decision making and so on [5,6,8,9,12,19,21,24,26,37,38,43,44,49,50,54,57].
Rough set models on two universes are valuable generalizations of classic rough set model. These models can be applied into formal concept analysis and decision-making system. This paper focuses on catoptrical approach for rough set models over two universes using granule-based definition. In this paper, we propose a new catoptrical rough set model over dual universes. The properties of the proposed model are discussed. Consequently, a variable precision extension of the model is presented. Some properties of the extended model are obtained. Finally, we investigate a further extension of the proposed model, i.e. variable precision catoptrical rough set model with two parameters.
Rough sets in distributed decision information systems
2016, Knowledge-Based Systems
In “traditional” rough set methodologies, data are assumed to be stored in a single data repository. However, this assumption is not always true in many real-world problems, where data may be distributed across multiple locations, which is especially pertinent with the development of the Internet. To cope with this phenomenon, in this paper we extend the methodology of rough sets to distributed decision information systems. We first present a definition of rough sets in distributed decision information systems. Then we study the reducibility of distributed decision information systems at two different levels of granularity. The conditions for a decision information table or an attribute in distributed decision information systems to be reducible are presented, and an approach to compute reducts of a distributed decision information system is developed. The experimental results show that the proposed approach can be used to simplify distributed decision information systems, while retain their classification abilities.

View all citing articles on Scopus

View full text

Decision rule mining using classification consistency rate

Abstract

Introduction

Section snippets

Rough set theory

Classification consistency rate

An illustrative example

Experiment study

Conclusion

Acknowledgements

International Journal of Man-Machine Studies

Artificial Intelligence

Applied Soft Computing

Information Sciences

Knowledge-Based Systems

Information Sciences

Applied Soft Computing

Knowledge-Based Systems

Information Sciences

Applied Soft Computing

Applied Soft Computing

Applied Soft Computing

Journal of Intelligent Data Analysis

Knowledge-Based Systems

Knowledge-Based Systems

Information Sciences

Information Sciences

Information Sciences

Classification and Regression Trees