Constructing importance measure of attributes in covering decision table

doi:10.1016/j.knosys.2014.12.018

Knowledge-Based Systems

Volume 76, March 2015, Pages 228-239

https://doi.org/10.1016/j.knosys.2014.12.018 Get rights and content

Abstract

In rough set theory, attributes importance measure is a crucial factor in applications of attribute reduction and feature selection. Many importance measure methodologies for discrete-valued information system or decision table have been developed. However, there are only limited studies on importance measurement for numerical-valued information system. In this paper, knowledge change-based importance measure, with the structural characteristics of fuzzy measure, is introduced to evaluate the importance of attributes in covering decision table. We first present the concept of similarity block in attribute space, based on which coverings are induced to construct the lower and upper approximation operators in covering-based rough sets. In particular, the traditional importance measure is extended to deal with covering decision table. Further, an evaluation model based on the knowledge change-based importance measure is constructed. Experiments are conducted on the public data sets from UCI, and a case study on the students’ overall evaluation is given finally. Theoretical analysis and experimental results show that the proposed importance measure is effective for evaluating the importance of attributes in covering decision table.

Introduction

Rough set theory [1], proposed by Pawlak in the early 1980s, is a mathematical tool to deal with uncertainty and incomplete information. It is an important tool in knowledge discovery in information system. Over the last decades, there has been much work on information systems with rough sets. Data mining based on rough sets is employed widely to obtain knowledge from databases. It also has found successful applications in attribute reduction, machine learning, pattern recognition, image processing and decision-making analysis from large data sets [2], [3], [4], [5], [6], [7].

In attribute reduction and feature selection methods, attribute importance measure is a crucial factor. To evaluate the importance of attributes in information system, several importance measures have been proposed in rough set theory. Many authors have studied the importance measure of attributes based on positive region, Shannon’s conditional entropy, complement conditional entropy, combination conditional entropy and rough entropy. The idea of attribute reduction using positive region was originated by Grzymala-Buss [8], [9]. In a classical rough set model, the conditional entropy of Shannon’s information entropy was introduced in [10], [11] to find the relative reduct of a decision table. Based on the complement entropy, its conditional entropy was used to measure the importance of attributes in [12]. Qian and Liang [13] presented combination entropy to measure the uncertainty of information systems and used its conditional entropy to obtain an attribute subset. Sun et al. [14] proposed a new rough entropy associated to partitions, and presented the importance measure of attributes on the basis of this entropy. Yao et al. [15], [16], [17] studied several kinds of information entropy measures for attribute importance in rough set theory. However, the studies mentioned above are mainly focused on discrete-valued information system or decision systems.

Traditional rough set theory is suitable for discrete data rather than continues data since only partition or equivalence relation are considered. As pointed out by some researchers, partition or equivalence relation in traditional rough set theory is too restrictive for practical applications. Many practical data sets, in which some attributes are numerical, cannot be handled by partitions. To address this issue, some scholars extended the equivalence relations to similarity relation [18], tolerance relation [19], [20], and even general binary relations [21], [22]. An example extension is neighborhood-based rough set model. Lin [23] pointed out that neighborhood spaces are more general topological spaces than equivalence spaces, and introduced neighborhood relation into rough set theory. The properties of neighborhood approximation spaces were discussed in [24]. Hu et al. [25], [26] constructed a unified theoretical framework for neighborhood-based rough set model and a feature selection algorithm toward hybrid data. Liu et al. [27] proposed an efficient quick attribute reduction algorithm based on neighborhood rough set model. Another example extension of the traditional rough set model is covering-based rough set model [28], [29], [30], which is based on the relaxation of the partition to covering. The covering of a universe is used to define the upper and lower approximations of any subset of the universe. Most of the studies of covering-based rough set model mainly focus on the construction of the upper and lower approximations. In [31], [32], [33], [34], covering based approximation operators were discussed according to the element, granule, and subsystem based definitions. However, less effort has been put on the discussions of the construction of covering in numerical-valued information systems. In addition, there are only limited studies on the importance measure for covering decision table. In this paper, we address the issue of attributes importance measurement in covering decision table and propose effective knowledge change-based importance measure for covering decision table. The similarity block induced by attribute is given, under which coverings of the universe with numerical attributes are constructed. Further, the covering approximation operators are presented. In addition, the traditional importance measure is extended to deal with covering decision table, and the knowledge change-based importance measure with the structural characteristics of fuzzy measure is presented. Experimental results demonstrate that the knowledge change-based importance measure is effective and suitable for evaluating the importance of attributes in covering decision table. Experimental results also indicate that the knowledge change-based importance measure outperforms the traditional importance measure in covering decision table. The main contributions of the work are twofold. First, we give the concrete construction strategy of coverings to deal with data with numerical attributes or mixed attributes; second, a knowledge change-based importance measure, which meets the conditions of fuzzy measure, is presented in covering decision table.

The rest of the paper is organized as follows. Section 2 reviews some relative concepts in rough sets. In Section 3, we introduce the concept of similarity block in attribute space to construct coverings of the universe. Further, the approximation operators in covering approximation space are given. Besides, we present the extended importance measure and knowledge change-based importance measure in covering decision table. In Section 4, simulation experiments are carried out to evaluate the validity of the proposed method. The evaluation model with respect to knowledge change-based importance measure is constructed in Sections 5 and 6 shows a case study on students’ overall quality evaluation. Section 7 concludes this paper with some remarks and discussions.

Section snippets

Preliminaries

In this section, we briefly review some basic concepts related to rough sets and covering.

An information system in rough sets is a quadruple (U, A, v, f), where U is a non-empty finite set of objects, A is a non-empty finite set of attributes, v is the value set of attribute and f is the mapping function: f: U × R → v. In particular, (U, A ∪ {d}, v, f) is called a decision table, where A denotes the condition attributes and d denotes the decision attribute.

In Pawlak’s rough set theory [1], the objects with

Importance measurement approaches in covering decision table

Traditional importance measure of attributes is mainly suitable for discrete-value information system. In the following, the similarity block is introduced to deal with the numerical attribute in the system and two kinds of importance measures in the covering decision table are given.

Experiments

In order to demonstrate the advantages of the proposed importance measure in this paper, some experiments are conducted on three public data sets, which are Iris, Liver disorders and Heart disease. The data sets are downloaded from the machine learning data repository, University of California at Irvine [37]. The experimental data are randomly selected from the three data sets. Table 4 shows the experimental data of Iris with thirty samples and four numerical condition attributes. Table 5 shows

Evaluation model with respect to knowledge change-based importance measure

Besides feature selection and attribute reduction, the importance measure also plays a significant part in comprehensive evaluation and multi-attribute decision making. To construct the evaluation model, another factor needed consideration is the selection of the aggregation operators. The ordered weighted averaging (OWA) operator introduced by Yager is a parameterized family of aggregation operators which have been used in many applications. But an important issue in the application of OWA

An illustration of proposed method

In this section, a case study on students’ overall quality evaluation is conducted by using the MIM. We randomly select some evaluation data from a college’s website (www.cdsygz.com/zongheceping/putongrenyuan/login.asp), as presented in Table 10. These data show the overall quality evaluation results of a class majoring in tourism during 2012–2013. The attributes used in the overall quality evaluation include morality (a₁), academic record (a₂) and physical quality (a₃), and development quality

Conclusions

In this paper, we have studied the measurement of attribute importance about covering decision table. The original importance measure in traditional rough sets is extended to deal with covering decision table. In addition, we propose the knowledge change-based importance measure, which reflects the certain knowledge as well as the relevant knowledge in covering decision table. Theoretical analysis show that the importance of attributes monotonically increases with the attribute subset. The

Acknowledgements

This work is supported by National Natural Science Foundation of China (71371064) and the Natural Science Foundation of Hebei Province (F2015208100, F2015208099).

References (38)

Y.H. Qian et al.
Positive approximation: an accelerator for attribute reduction in rough set theory
Artif. Intell.
(2010)
X.D. Yue et al.
Multiscale roughness measure for color image segmentation
Inf. Sci.
(2012)
J.B. Zhang et al.
Composite rough sets for dynamic data mining
Inf. Sci.
(2014)
G.P. Lin et al.
NMGRS: neighborhood-based multigranulation rough sets
Int. J. Approx. Reason.
(2012)
J.H. Dai et al.
Attribute selection based on a new conditional entropy for incomplete decision systems
Knowl.-Based Syst.
(2013)
Y.Y. Yao et al.
A measurement theory view on the granularity of partitions
Inf. Sci.
(2012)
Y.Y. Yao et al.
Quantitative rough sets based on subsethood measures
Inf. Sci.
(2014)
Z. Meng et al.
A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets
Inf. Sci.
(2009)
Y.Y. Yao
Two views of the theory of rough sets in finite universes
Int. J. Approx. Reason.
(1996)
Q.H. Hu et al.
Mixed feature selection based on granulation and approximation
Knowl.-Based Syst.
(2008)

Q.H. Hu et al.

Neighborhood rough set based heterogeneous feature subset selection

Inf. Sci.

(2008)

G.P. Lin et al.

Multigranulation rough sets: from partition to covering

Inf. Sci.

(2013)

L.W. Ma

On some types of neighborhood-related covering rough sets

Int. J. Approx. Reason.

(2012)

Y.L. Zhang et al.

Relationships between covering-based rough sets and relation-based rough sets

Inf. Sci.

(2013)

G.L. Liu et al.

A comparison of two types of rough sets induced by coverings

Int. J. Approx. Reason.

(2009)

C.H. Liu et al.

On multi-granulation covering rough sets

Int. J. Approx. Reason.

(2014)

M. Restrepo et al.

Duality, conjugacy and adjointness of approximation operators in covering-based rough sets

Int. J. Approx. Reason.

(2014)

Z. Bonikowski et al.

Extensions and intentions in the rough set theory

Inf. Sci.

(1998)

Z. Zheng et al.

Rule sets based bilevel decision model and algorithm

Expert Syst. Appl.

(2009)

Cited by (3)

Attribute importance measurement method based on data coordination degree
2020, Knowledge-Based Systems
Citation Excerpt :
By combining D–S evidence theory with rough set, Du and Hu [23] proposed the inner (outer) attribute importance of coordinated (uncoordinated) ordered decision table. Taking the division of data sets as the carrier of knowledge, Li and Jin et al. [24–26] discussed the correlation features between the upper (lower) approximation of the decision class, the positive domain and the knowledge hidden in the system, and proposed several attribute importance measurement methods based on knowledge change rate satisfying the characteristics of the fuzzy measure structure. Jing and Li et al. [27] proposed the attribute importance measurement method of dynamic decision system from the perspective of knowledge granularity.
The increasing scale of data information cause the great amount of irrelevant attributes, which becomes a challenging issue for machine learning. Therefore, removing the redundancy through sorting the attributes with appropriate significance has attracted wide attention in academic and application. Taking the knowledge hidden in the data system as the carrier and the inclusion relationship between sets as the basis, this paper proposes the concept of decision coordination degree. Then a composite attribute importance measurement based on core data is established (BCD-AICM). Further the basic properties and features of BCD-AICM are discussed. Finally, the similarities and differences between BCD-AICM and the existing attribute importance measurement methods are discussed using eight UCI data sets. The theoretical analysis and experiments results show that the BCD-AICM has good interpretability and structural characteristics. This method enriches the existing related theories and has broad application prospects in the fields of fuzzy decision-making, knowledge acquisition, resource management, and artificial intelligence etc.
Decision table in Rough Set as a new chemometric approach for synthesis optimization: Mn-doped ZnS quantum dots as the example
2018, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
Hence, new methods that could simplify the synthesis procedure (and focus on vital conditions) but still produce best product are welcomed. Rough Set, proposed by Zdzistaw Pawlak, is a mathematical theory to deal with uncertainly and incomplete information [16–19]. As an analytical method, decision table (DT) in Rough Set has been applied in artificial intelligence, decision analysis, data mining and health care [20–27].
Decision table (DT) in Rough Set was first proposed as a new chemometric approach for the optimization of synthesis strategy in this work. The fluorescence (FL) performance optimization of Mn-doped ZnS quantum dots was utilized as an example to illustrate the analysis procedure and verify the rationality of DT. Five condition attributes (namely synthesis conditions) were first reduced to be three through the analysis of FL intensity (decision attribute) and attribute reduction of the first DT. Two core attributes were confirmed: the volume ratio of ZnSO₄ and Na₂S solution, the volume of MnCl₂ solution. It was then found that the latter was the most important condition attribute from attribute reduction of the second DT and the optimal synthesis strategy was obtained. The results were then verified by the use of single factor analysis and orthogonal experiment. Finally, it is concluded that the proposed method has the advantages of attribute reduction, core attribute determination and no requirement on evenly distributed conditions despite the need of certain mathematical knowledge. More importantly, it presents superiority for synthesis optimization when handling larger number of condition attributes due to attribute reduction. DT might be a new chemometric approach for the optimization of materials synthesis with multiple factors.
Knowledge change rate-based attribute importance measure and its performance analysis
2017, Knowledge-Based Systems
Citation Excerpt :
Attribute importance can supply new perspectives for analyzing data and help decision-makers determine the essential meaning of data sets. Therefore, attribute importance measures have been receiving attention from researchers [9,15–21]. Due to correlation problems among attributes in most real applications, some scholars used fuzzy measure [22] to describe an attribute importance, whose rationality and feasibility had been verified [23,24].
Attribute importance measure is important in such approaches as data system reduction and, multi-attribute decisions. In this paper, we present knowledge change rate-based attribute importance measures with structural features of fuzzy measure, abbreviated as BCKCR–AIM. We discuss theoretical construction strategies and structural features followed by remarks on constructing BCKCR–AIM. Finally, experimental results for several examples and UCI data sets show the connections and differences between BCKCR–AIM and other attribute importance measures. The advantage of our measure is that it uses attributes set changes to describe knowledge change and associated features between lower and upper approximations of decision classes and knowledge to reflect attribute importance. Our measure can improve feasibility and interpretability; therefore, BCKCR–AIM has wide application in such approaches as attributes reduction, feature extraction, information fusion, and expert systems.

View full text

Constructing importance measure of attributes in covering decision table

Abstract

Introduction

Section snippets

Preliminaries

Importance measurement approaches in covering decision table

Experiments

Evaluation model with respect to knowledge change-based importance measure

An illustration of proposed method

Conclusions

Acknowledgements

Artif. Intell.

Inf. Sci.

Inf. Sci.

Int. J. Approx. Reason.

Knowl.-Based Syst.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Int. J. Approx. Reason.

Knowl.-Based Syst.

Inf. Sci.

Inf. Sci.

Int. J. Approx. Reason.

Inf. Sci.

Int. J. Approx. Reason.

Int. J. Approx. Reason.

Int. J. Approx. Reason.

Inf. Sci.

Expert Syst. Appl.