Learning rules from incomplete training examples by rough sets

doi:10.1016/S0957-4174(02)00016-7

Expert Systems with Applications

Volume 22, Issue 4, May 2002, Pages 285-293

https://doi.org/10.1016/S0957-4174(02)00016-7 Get rights and content

Abstract

Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, the rough-set theory was widely used in dealing with data classification problems. In this paper, we deal with the problem of producing a set of certain and possible rules from incomplete data sets based on rough sets. A new learning algorithm is proposed, which can simultaneously derive rules from incomplete data sets and estimate the missing values in the learning process. Unknown values are first assumed to be any possible values and are gradually refined according to the incomplete lower and upper approximations derived from the given training examples. The examples and the approximations then interact on each other to derive certain and possible rules and to estimate appropriate unknown values. The rules derived can then serve as knowledge concerning the incomplete data set.

Introduction

Expert systems have been widely used in domains where mathematical models cannot be easily built, human experts are not available or the cost of querying an expert is high. Although a wide variety of expert systems have been built, knowledge acquisition remains a development bottleneck. Usually, a knowledge engineer is needed to establish a dialog with a human expert and to encode the knowledge elicited into a knowledge base to produce an expert system. The process is however very time-consuming (Buchanan and Shortliffe, 1984, Giarratano and Riley, 1989). Shortening the development time is then the most important factor for the success of an expert system.

Recently, machine-learning techniques have been developed to ease the knowledge-acquisition bottleneck. Among proposed approaches, deriving rules from training examples is the most common (Hong et al, 2001, Hong et al., 2000, Kodratoff and Michalski, 1983, Michalski et al., 1983, Michalski et al., 1984, Tsumoto, 1998). Given a set of examples, a learning program tries to induce rules that describe each class.

Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. Designing a sophisticated learning algorithm able to deal with incomplete data sets presents a challenge to workers in this research field. In the past, several methods were proposed to handle the problem of incomplete data sets (Chmielewski et al., 1993, Slowinski and Stefanowski, 1989, Slowinski and Stefanowski, 1994). For example, incomplete data sets may first be transformed into complete data sets (such as by similarity measure) before learning programs begin (Chmielewski et al., 1993), objects with unknown values may be directly removed from data sets (Chmielewski et al., 1993), or unknown objects may be processed in a particular way (Kryszkiewicz, 1998, Liang and Xu, 2000).

The rough-set theory was proposed by Pawlak in 1982 (Pawlak, 1982, Pawlak, 1996) and has been used in reasoning and knowledge acquisition for expert systems (Grzymala-Busse, 1988, Orlowska, 1994). It uses the concept of equivalence classes as its basic principle. Several applications and extensions of the rough-set theory have been proposed. Examples are Orlowska's (1994) reasoning with incomplete information, Germano and Alexandre's (1996) knowledge-base reduction, Lingras and Yao's (1998) data mining, Zhong, Dong, Ohsuga, and Lin's (1998) rule discovery. Because of the success of the rough-set theory in knowledge acquisition, many researchers in the database and machine-learning fields are very interested in this new research topic since it offers opportunities to discover useful information in training examples.

In this paper, we deal with the problem of producing a set of certain and possible rules from incomplete data sets in a different way. We propose a new learning approach based on rough sets, which can simultaneously derive rules from incomplete data sets and estimate the missing values in the learning process. Unknown values are first assumed to be any possible values and are gradually refined according to the incomplete lower and upper approximations derived from the given training examples. The examples and the approximations then interact on each other to derive certain and possible rules and to estimate appropriate unknown values.

The remainder of this paper is organized as follows. The rough-set theory is briefly reviewed in Section 2. Kryszkiewicz's approach for managing incomplete data sets is described in Section 3. The definitions used in this paper are described in Section 4. A novel learning algorithm based on the rough-set theory to simultaneously induce rules and estimate unknown values from incomplete data sets is proposed in Section 5. An example is given to illustrate the proposed algorithm in Section 6. Conclusion and future work are finally given in Section 7.

Section snippets

Review of the rough-set theory

The rough-set theory, proposed by Pawlak in 1982 (Pawlak, 1982, Pawlak, 1996), can serve as a new mathematical tool for dealing with data classification problems. It adopts the concept of equivalence classes to partition training instances according to some criteria. Two kinds of partitions are formed in the mining process: lower approximations and upper approximations, from which certain and possible rules can easily be derived.

Formally, let U be a set of training examples (objects), A be a

Incomplete data sets

Data sets can be roughly classified into two classes: complete and incomplete data sets. All the objects in a complete data set have known attribute values. If at least one object in a data set has a missing value, the data set is incomplete. Table 2 shows an example of an incomplete data set.

In Table 2, the symbol ‘∗’ denotes an unknown attribute value. Thus, the SP values of Obj⁽⁵⁾ and Obj⁽⁹⁾ are unknown. Similarly, the DP value of Obj⁽⁷⁾ is unknown. The data set is thus incomplete.

Learning

Definitions

Since an incomplete data set contains unknown attribute values, the original equivalence relation in rough sets must be modified to manage it. As before, an equivalence class is formed for each attribute value or for each value combination of attributes. In this paper, each object is represented as a tuple (obj, symbol), where the symbol may be certain (c) or uncertain (u). If an object obj⁽ⁱ⁾ has a certain value v_j⁽ⁱ⁾ for attribute A_j, then (obj⁽ⁱ⁾, c) is put in the equivalence class for v_j⁽ⁱ⁾

A rough-set-based approach to simultaneously estimate missing values and derive rules

In this section, a new learning algorithm based on rough sets is proposed, which can simultaneously estimate the missing values and derive certain and possible rules from incomplete data sets. As mentioned before, each object is represented as a tuple (obj, symbol), where the symbol may be certain (c) or uncertain (u). If the object has a missing value of an attribute, it is first put into each incomplete equivalence class from that attribute.

The algorithm then calculates incomplete lower

An example

The incomplete data set in Table 2 (in Section 3) is used to demonstrate how the proposed algorithm can simultaneously estimate missing values and derive certain and possible rules. There are seven objects and two attributes SP and DP with some missing values in the data set. Three classes for BP are to be classified. The proposed learning algorithm processes this incomplete data set as follows.

Step 1. Since three classes exist in the incomplete data set, three partitions are formed as follows: $X$

Conclusion and future work

In this paper, we have proposed a new learning approach to derive rules from incomplete data sets based on the rough-set theory. The proposed approach is different from others in that it can derive rules and estimate the missing values at the same time. The incomplete lower and upper approximations have been defined for managing uncertain objects in incomplete data sets. The interaction between data and approximations helps derive certain and possible rules from incomplete data sets and

References (21)

M. Kryszkiewicz
Rough set approach to incomplete information systems
Information Science
(1998)
R. Slowinski et al.
Rough classification in incomplete information systems
Mathematical and Computer Modelling
(1989)
S. Tsumoto
Extraction of experts' decision rules from clinical databases using rough set model
Intelligent Data Analysis
(1998)
W. Ziarko
Variable precision rough set model
Journal of Computer and System Sciences
(1993)
B.G. Buchanan et al.
Rule-based expert system: the MYCIN experiments of the Standford heuristic programming projects
(1984)
M.R. Chmielewski et al.
The rule induction system LERS—a version for personal computers
Foundations of computing and decision sciences
(1993)
Germano, L. T., & Alexandre, P. (1996). Knowledge-base reduction based on rough-set techniques. The Canadian Conference...
J.C. Giarratano et al.
Expert systems—principles and programming
(1989)
J.W. Grzymala-Busse
Knowledge acquisition under uncertainty: A rough set approach
Journal of Intelligent Robotic Systems
(1988)
T.P. Hong et al.
Knowledge acquisition from quantitative data using the rough-set theory
Intelligent Data Analysis
(2000)

There are more references available in the full text version of this article.

Cited by (94)

Dynamic information fusion in multi-source incomplete interval-valued information system with variation of information sources and attributes
2022, Information Sciences
Citation Excerpt :
Regarding knowledge acquisition, Li et al. [11] proposed a novel interval set model to induce classification rules from incomplete data. Hong et al. [12] established a new rules-deriving algorithm for incomplete data by estimating the missing value in the process of rule deriving. Leung et al. [13] efficiently acquired knowledge for incomplete information systems by defining a set of simpler discernibility functions.
Interval-valued data describe the random phenomenon that abounds in the real world, a pivotal research orientation in uncertainty processing. With the rapid development of big data, we may gather information from multiple information sources. To effectively acquire knowledge from multiple information sources, information fusion is commonly used to get a unified representation. However, sometimes data gathered from multiple sources may be lost; it is meaningful and necessary to study the fusion of multi-source incomplete interval-valued data. We propose a novel information fusion method based on information entropy for multi-source incomplete interval-valued data and four incremental fusion mechanisms characterized by the change in information sources and attributes. The corresponding static and dynamic fusion algorithms are designed, and their time complexities are analyzed. Experimental results show that the proposed method outperforms the mean, max, and min fusion methods. Furthermore, the four incremental fusion mechanisms reduced the runtime compared with the static fusion mechanism.
A novel approach to predictive analysis using attribute-oriented rough fuzzy sets
2020, Expert Systems with Applications
Citation Excerpt :
Dubois and Prade (Dubois & Prade, 1990) used an equivalence relation of the universe to introduce the upper and lower approximations of fuzzy sets (FSs) in an equivalent approximation space and proposed an extended concept called rough fuzzy sets (RFSs). As an efficient mathematical model for addressing inexact or uncertain knowledge, RST has been applied in diverse fields, such as medical diagnosis (Pattaraintakorn & Cercone, 2008), machine learning (Hong, Lin, Lin, & Wang, 2008; Hong, Tseng, & Wang, 2002), decision making (Xiong, Su, & Li, 2012, 2012, 2011, 2003, 2011), pattern recognition (Wang & Wang, 2009), case-based reasoning (Huang & Tseng, 2004), and DM (Lingras & Yao, 1998, 2009). Yu, Cai, and Li (2018) combined the advantages of TOPSIS and λ-rough sets to predict the highest potential of objects information systems (ISs).
In this study, a forecasting decision-making method is put forward to deal with multi-attribute decision-making problems. On the basis of rough set theory, (γ, δ)-rough fuzzy sets are presented using δ-clusters in data mining. Furthermore, several characteristics of the upper and lower (γ, δ)-approximations are obtained. Lastly, the difference between the fuzzy set A of the object and the upper and lower rough (γ, δ)-approximation operators on A is analyzed. We also design a novel algorithm to forecast decision making and provide a related example illustrating the new method.
A characterization of novel rough fuzzy sets of information systems and their application in decision making
2019, Expert Systems with Applications
Citation Excerpt :
This method is particularly useful when dealing with uncertain and vague knowledge in information systems. Numerous applications of rough set methods have been presented in research related to decision-making (Salamó & López-Sánchez, 2011; Son, Kim, Kim, Park, & Kim, 2012; Swiniarski & Skowron, 2003; Tian, Zeng, & Keane, 2011; Xiong, Su, & Li, 2012), medical diagnosis (Pattaraintakorn & Cercone, 2008), case-based reasoning (Huang & Tseng, 2004), machine learning (Hong, Lin, Lin, & Wang, 2008; Hong, Tseng, & Wang, 2002), pattern recognition (Wang & Wang, 2009), and data mining (Lingras & Yao, 1998; Yamaguchi, 2009). The construction of a pair of lower and upper approximation operators is essential in rough set theory.
A novel concept of α-rough fuzzy sets is proposed as a generalization of rough fuzzy sets in information systems. A pair of rough fuzzy approximation operators is presented based on one type of correlation between attributes, namely, α-correlative relation. In addition, the properties of α-rough fuzzy sets are investigated. Finally, we develop an approach for decision-making problems based on lower and upper α-approximations and provide an actual example.
A λ-rough set model and its applications with TOPSIS method to decision making
2019, Knowledge-Based Systems
As an extension of classical rough sets, the concept of $λ$ -rough sets is introduced in information systems. We put forward the notions of $λ$ -indiscernibility relation and $λ$ -approximation space in information systems. The properties of roughness for $λ$ -approximation space are explored. To illustrate the usefulness of $λ$ -approximation spaces, we provide two approaches to deal with a special type of multiattribute decision-making problems. By comparative analysis, the ranking results based on two different approaches have a high consensus. Although there are some different ranking results of these two methods, the optimal selected alternative is the same.
Considerations on the principle of rule induction by STRIM and its relationship to the conventional Rough Sets methods
2018, Applied Soft Computing Journal
Citation Excerpt :
The conclusions reached in [30] were: However, the conventional methods for missing attribute values [27,28,30] are still to propose the principle for handling such datasets and illustrate the principle by using a small sample-size dataset of at most twelve. There are few examples of applying them to real-world datasets and confirmed to be useful as far as we know.
STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induce if–then rules from the decision table, and its effectiveness has been confirmed by simulation experiments. The method was studied independently of the conventional rough sets methods. This paper summarizes the basic notion of the conventional rule induction methods and newly formulates the idea of STRIM, and then considers the relationship between STRIM and conventional methods, especially VPRS (Variable Precision Rough Set), and shows that STRIM develops the notion of VPRS into a statistical principle. In a simulation experiment, we also consider the condition that STRIM induces the true rules specified in advance.
On the other hand, real-world datasets are often small and/or contain missing and contaminated values in the decision table from various reasons. In order to apply STRIM to real-world datasets, we examine the capacity of STRIM in such circumstances by a simulation experiment, after studying the question of what size dataset is required for STRIM. Such studies and examinations are very important to confirm if STRIM is properly applied to real-world datasets, and the results are reasonable.
Uncertainty learning of rough set-based prediction under a holistic framework
2018, Information Sciences
Uncertainty learning is an important research direction of rough set theory, wherein the most popular one is rough set-based prediction, whose goal is to extract decision rules from decision systems and then assign the corresponding decision labels for new samples in terms of the decision rules. To design efficient prediction algorithms, it is necessary and meaningful to measure the uncertainty of rough set-based prediction, especially the stability and generalization performance. In this paper, we analyze the generalization performance of rough set-based prediction algorithms in terms of algorithmic stability analysis and give the generalization error bounds. Firstly, we propose a general rough set-based prediction algorithm to predict the labels for new samples, and then define a scoring function and the corresponding loss function. Secondly, we define two kinds of algorithmic stability for this prediction algorithm in terms of their loss functions, by which two general generalization error bounds are obtained according to two different kinds of stability: strong stability and pointwise hypothesis stability. The bounds numerically imply the performance of the proposed rough set-based prediction algorithm is related to the number of samples and stability parameter. Thirdly, we adopt the confidence and max confidence, min support algorithms as the specific scoring functions instead of general scoring functions. The results show the prediction performance of the confidence algorithm is related to the number of samples and stability parameter, as well as that of max confidence, min support algorithm is associated with the number of samples and minimum support threshold. Based on these discussions, a general framework of stability and generalization error bounds analysis for rough set-based prediction is established. Finally, several experiments are performed to test the previous conclusions.

View all citing articles on Scopus

View full text

Learning rules from incomplete training examples by rough sets

Abstract

Introduction

Section snippets

Review of the rough-set theory

Incomplete data sets

Definitions

A rough-set-based approach to simultaneously estimate missing values and derive rules

An example

Conclusion and future work

Information Science

Mathematical and Computer Modelling

Intelligent Data Analysis

Journal of Computer and System Sciences

Rule-based expert system: the MYCIN experiments of the Standford heuristic programming projects

The rule induction system LERS—a version for personal computers

Foundations of computing and decision sciences

Expert systems—principles and programming

Knowledge acquisition under uncertainty: A rough set approach

Journal of Intelligent Robotic Systems

Knowledge acquisition from quantitative data using the rough-set theory

Intelligent Data Analysis