Elsevier

Information Sciences

Volume 181, Issue 11, 1 June 2011, Pages 2086-2110
Information Sciences

Positive approximation and converse approximation in interval-valued fuzzy rough sets

https://doi.org/10.1016/j.ins.2011.01.033Get rights and content

Abstract

Methods of fuzzy rule extraction based on rough set theory are rarely reported in incomplete interval-valued fuzzy information systems. Thus, this paper deals with such systems. Instead of obtaining rules by attribute reduction, which may have a negative effect on inducting good rules, the objective of this paper is to extract rules without computing attribute reducts. The data completeness of missing attribute values is first presented. Positive and converse approximations in interval-valued fuzzy rough sets are then defined, and their important properties are discussed. Two algorithms based on positive and converse approximations, namely, mine rules based on the positive approximation (MRBPA) and mine rules based on the converse approximation (MRBCA), are proposed for rule extraction. The two algorithms are evaluated by several data sets from the UC Irvine Machine Learning Repository. The experimental results show that MRBPA and MRBCA achieve better classification performances than the method based on attribute reduction.

Introduction

A basic issue in a rule-based system is extracting rules for classification or inference. The rough set approach uses only internal knowledge, avoids external parameters, and does not rely on prior model assumptions such as probabilistic distribution in statistical methods and basic probability assignment in the Dempster–Shafer theory. Its basic idea is to search for an optimal attribute set to generate rules through an objective knowledge induction process.

The classical rough set theory developed by Pawlak [24], [25] is used only to describe sets. We are interested in extending the rough set model of Pawlak in two ways. To describe crisp and fuzzy concepts, Dubois and Prade [5], [6] extended the basic idea of rough sets to a new model called fuzzy rough sets. This new model has been proven a promising tool for pattern recognition, data mining, and knowledge discovery [1], [2], [3], [4], [5], [6], [7], [9], [13], [14], [15], [16], [17], [22], [23], [26], [27], [28], [29], [32], [33], [34], [35], [36], [37], [38], [39], [40], [43], [44], [45], [46], [47], [49], [51], [52], [53]. In addition, there are symbolic values, real values, or interval values in a practical database [34]. For example, data such as current, ID, temperature, time, and voltage are often described by interval values. The traditional fuzzy rough set theory effectively cannot deal with these kinds of data. Extending the rough set theory of Pawlak to a wider application is necessary. Thus, the model of interval-valued fuzzy rough (IVFR) sets was developed [7], [38]. Here, we review two studies in this domain.

Sun et al. [38] defined IVFR sets and presented the attribute reduction method, which addresses interval-valued fuzzy information systems with both crisp condition and interval-valued fuzzy decision attributes. The reduction process has three steps: (1) computing the discernibility matrix of the information system; (2) searching the consistent set of the condition attribute set; and (3) obtaining a reduct by computing the minimum consistent set. However, the definition of the discernibility matrix is the same as that in the rough set theory by Pawlak, which means the discernibility matrix is effective only for nominal attributes. When the condition attributes are numerical or fuzzy interval values, the reduction method is ineffective because it cannot compute the discernibility matrix.

Gong et al. [7] proposed a knowledge discovery method for interval-valued fuzzy information systems. The method classifies each object in a decision class according to its maximal membership represented by a fuzzy interval. However, the method is designed for interval-valued fuzzy information systems with both crisp condition and interval-valued fuzzy decision attributes. If the condition attribute set includes m attributes, then the antecedent of the rule must include m conditions; overfull conditions may reduce the classification accuracy and the applicability of the rules. Two memberships represented by fuzzy intervals are incomparable when one interval is nested in the other; rules cannot be generated in this case.

Aside from [7], [38], few studies on fuzzy rule extraction are based on rough sets in interval-valued fuzzy information systems. Establishing a more practical model for fuzzy rule extraction in interval-valued fuzzy information systems is necessary. The model should satisfy the following three requirements. First, it can be applied to three types of interval-valued fuzzy information systems, namely, (1) crisp condition and interval-valued fuzzy decision, (2) interval-valued fuzzy condition and crisp decision, and (3) interval-valued fuzzy condition and decision. Second, the computational complexity of the model should be relatively low. Third, rules can be generated when one interval is nested in the other.

Attribute reduction usually serves as a preparatory step before rule extraction [24], whose objective is to reduce attributes and thus reduce the complexity of the rule extraction process. Various attribute reduction methods have been proposed in rough sets and in fuzzy rough sets [2], [4], [12], [13], [14], [15], [16], [23], [37], [39], [40], [43], [44], [45], [52], [53]. The IVFR set theory generalizes the traditional fuzzy rough set theory; thus, extracting rules based on attribute reduction is natural. However, this paper does not intend to extract rules based on attribute reduction due to the following reasons. Attribute reduction methods can be classified into three types: one based on the positive region [2], [15], [16], [37], one based on the discernibility matrix [39], [40], [43], [50], and another based on entropy [12], [13], [14]. For example, Shen and Jensen [16], [37] conducted pioneering studies on attribute reduction based on a positive region and proposed an attribute reduction algorithm. However, an obvious limitation is the algorithm may not be convergent on many real data sets or the selected attributes are unreliable. Moreover, the computational complexity of the algorithm often increases exponentially with increasing samples and attributes [2]. Bhatt and Gopal [2] developed Shen’s algorithm by improving the definition of the lower approximation on a compact computational domain. However, the degree of dependency of a selected reduct may be larger than that of the entire attribute set due to the computing method of the positive region [40]. This is unreasonable because more attributes will offer better approximations in a rough set framework [40]. Tsang et al. [39], [40] proposed an algorithm using a discernibility matrix to compute all attribute reducts. However, the computational complexity is NP-hard [40]. Hu et al. [12], [13] proposed an attribute reduction method based on information entropy. The attribute reduction concept is not constructed using existing fuzzy approximation operators [47], and studying the structure of attribute reduction is difficult [49]. Each attribute reduction method has its characteristics and flaws. Therefore, rule extraction based on attribute reduction may be faulty. This paper intends to avoid the attribute reduction process and establish the structure of the approximation by introducing granulation order, and then extracting rules based on it.

From the viewpoint of granular computing, a concept is described by the upper and lower approximations under static granulation in the IVFR set, as defined by Sun [38]. Provided the granulation is unchangeable, it is unacceptable when the granulation is too fine or too coarse. Excessively fine granulation may increase time and cost, while an excessively coarse one may not satisfy requirements. We consider describing a concept under dynamic granulation. This means a proper granulation family can be selected to describe a target concept according to the practical requirement.

Granulation order in sets was introduced by Qian and co-workers [20], [30]. In our study, a granulation order is extended to fuzzy information systems. A positive granulation order is defined by adding one condition attribute at a time, which naturally defines a positive approximation space. Given a positive approximation space, a fuzzy concept can be described by the upper and lower approximations. Based on the positive approximation, a rule extraction algorithm called mine rules based on the positive approximation (MRBPA) is proposed. It is characterized by a gradually dwindling universe and a monotonously increasing approximation precision as the positive granulation order becomes longer. Thus, the computational complexity of the algorithm can be reduced effectively. Similarly, a converse granulation order involves deleting one condition attribute at a time, which defines a converse approximation space. Given a converse approximation space, a fuzzy concept can be described by the upper and lower approximations. As an application of the converse approximation, an algorithm called mine rules based on the converse approximation (MRBCA) is proposed for rule extraction. The main characteristic of MRBCA is that much simpler rules can be extracted by keeping the approximation precision invariant.

The rest of this paper is organized as follows. Section 2 briefly introduces related discussions about interval-valued fuzzy sets and IVFR sets. In Section 3, an algorithm called completeness of missing attribute values in interval-valued fuzzy information systems (CMAVIFIS) is presented for data completeness in interval-valued fuzzy information systems. In Section 4, the positive approximation is proposed, and important properties are obtained. A rule extraction algorithm called MRBPA based on the positive approximation is then designed; an example is illustrated. In Section 5, converse approximation is presented, and useful properties are deduced. The convergence degree of an interval-valued fuzzy set is defined and proven to increase in a converse granulation order. A new rule extraction algorithm called MRBCA based on the converse approximation is proposed and illustrated. In Section 6, the performances of CMAVIFIS, MRBPA, and MRBCA are evaluated by several data sets from the UC Irvine Machine Learning Repository (UCI). Section 7 concludes the paper.

Section snippets

Preliminaries

In this section, we briefly review the basic concepts of interval-valued fuzzy sets and IVFR sets.

Data completeness in interval-valued fuzzy information systems

Data completeness in interval-valued fuzzy information systems is the usual prerequisite for rule extraction. The process of converting an incomplete interval-valued fuzzy information system into a complete one, i.e., complementing the missing attribute values with specified values, is called the completeness of incomplete interval-valued fuzzy information systems.

Multiple completeness methods have been proposed [10], [11], [19], [31], [42], [54]. A simple method is to either delete objects

Positive approximation in IVFR sets

From the viewpoint of granular computing in IVFR defined by Sun [38], the concept is described under static granulation, i.e., a certain interval-valued fuzzy equivalence relation. However, we usually need to analyze and solve problems from multiviews and multilevels. Consider an extreme case. Suppose we select an interval-valued fuzzy equivalence relation R with the finest granulation; i.e., each fuzzy block contains only one object. An interval-valued fuzzy set F can be effectively expressed

Converse approximation in IVFR sets

The positive approximation approaches a target concept by the change in granulation. Due to the positive approximation, the approximation precision αP(F) increases as the positive granulation order becomes longer, and a family of fuzzy rules with granulation changing from coarse to fine can be obtained. However, in some applications, the approximation precision is restricted by the decision requirements or preference of decision makers [30]. An obvious problem is extracting simpler rules based

Experimental analysis

In this section, we first evaluate the performance of CMAVIFIS. We then compare the computing time and classification accuracy of MRBPA, MRBCA, and the fuzzy rule induction algorithm (RIA) on different data sets.

We download several data sets from the UCI Machine Learning database [55] to test our proposed methods. The data sets are outlined in Table 3. In the seven sets, two have a continuous class attribute, while the others have a categorical class attribute. Furthermore, the number of

Conclusions

This paper presents two fuzzy rule extraction methods for interval-valued fuzzy information systems. The main features of the methods cover four aspects. (1) Rule extraction is based on a granulation order, thus the adverse effects of attribute reduction are excluded as much as possible. (2) They can be applied to three types of interval-valued fuzzy information systems (i.e., crisp condition and interval-valued fuzzy decision, interval-valued fuzzy condition and crisp decision, and

Acknowledgments

The authors would like to thank Professor Witold Pedrycz, the Editor-in-Chief, the anonymous reviewers, Dr. Yan Zhao, Dr. Feifei Xu, and Feng Luo for their valuable comments and suggestions. This work was supported by the National Natural Science Foundation of China (Nos. 60475019, 60775036) and the Doctoral Program of Higher Education (No. 20060247039).

References (55)

  • Z. Pawlak et al.

    Rudiments of rough sets

    Information Sciences

    (2007)
  • Y.H. Qian et al.

    Consistency measure, inclusion degree and fuzzy measure in decision tables

    Fuzzy Sets and Systems

    (2008)
  • Y.H. Qian et al.

    On the evaluation of the decision performance of an incomplete decision table

    Data & Knowledge Engineering

    (2008)
  • Y.H. Qian et al.

    Measures for evaluating the decision performance of a decision table in rough set theory

    Information Sciences

    (2008)
  • Y.H. Qian et al.

    Interval ordered information systems

    Computers & Mathematics with Applications

    (2008)
  • Y.H. Qian et al.

    Converse approximation and rule extraction from decision tables in rough set theory

    Computers and Mathematics with Applications

    (2008)
  • A.M. Radzikowska et al.

    A comparative study of fuzzy rough sets

    Fuzzy Sets and Systems

    (2002)
  • Q. Shen et al.

    A rough-fuzzy approach for generating classification rules

    Pattern Recognition

    (2002)
  • Q. Shen et al.

    Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring

    Pattern Recognition

    (2004)
  • B.Z. Sun et al.

    Fuzzy rough set theory for the interval-valued fuzzy information systems

    Information Sciences

    (2008)
  • X.Z. Wang et al.

    Learning fuzzy rules from fuzzy samples based on rough set technique

    Information Sciences

    (2007)
  • X.Z. Wang et al.

    Induction of multiple fuzzy decision trees based on rough set technique

    Information Sciences

    (2008)
  • Y.F. Wang

    Mining stock price using fuzzy rough set system

    Expert Systems with Applications

    (2003)
  • Y.F. Yuan et al.

    Introduction of fuzzy decision tree

    Fuzzy Sets and Systems

    (1995)
  • S.Y. Zhao et al.

    On fuzzy approximation operators in attribute reduction with fuzzy rough sets

    Information Sciences

    (2008)
  • L. Zhou et al.

    On generalized intuitionistic fuzzy rough approximation operators

    Information Sciences

    (2008)
  • C. Cornelis et al.

    Feature selection with fuzzy decision reducts

  • Cited by (0)

    View full text