Elsevier

Knowledge-Based Systems

Volume 76, March 2015, Pages 228-239
Knowledge-Based Systems

Constructing importance measure of attributes in covering decision table

https://doi.org/10.1016/j.knosys.2014.12.018Get rights and content

Abstract

In rough set theory, attributes importance measure is a crucial factor in applications of attribute reduction and feature selection. Many importance measure methodologies for discrete-valued information system or decision table have been developed. However, there are only limited studies on importance measurement for numerical-valued information system. In this paper, knowledge change-based importance measure, with the structural characteristics of fuzzy measure, is introduced to evaluate the importance of attributes in covering decision table. We first present the concept of similarity block in attribute space, based on which coverings are induced to construct the lower and upper approximation operators in covering-based rough sets. In particular, the traditional importance measure is extended to deal with covering decision table. Further, an evaluation model based on the knowledge change-based importance measure is constructed. Experiments are conducted on the public data sets from UCI, and a case study on the students’ overall evaluation is given finally. Theoretical analysis and experimental results show that the proposed importance measure is effective for evaluating the importance of attributes in covering decision table.

Introduction

Rough set theory [1], proposed by Pawlak in the early 1980s, is a mathematical tool to deal with uncertainty and incomplete information. It is an important tool in knowledge discovery in information system. Over the last decades, there has been much work on information systems with rough sets. Data mining based on rough sets is employed widely to obtain knowledge from databases. It also has found successful applications in attribute reduction, machine learning, pattern recognition, image processing and decision-making analysis from large data sets [2], [3], [4], [5], [6], [7].

In attribute reduction and feature selection methods, attribute importance measure is a crucial factor. To evaluate the importance of attributes in information system, several importance measures have been proposed in rough set theory. Many authors have studied the importance measure of attributes based on positive region, Shannon’s conditional entropy, complement conditional entropy, combination conditional entropy and rough entropy. The idea of attribute reduction using positive region was originated by Grzymala-Buss [8], [9]. In a classical rough set model, the conditional entropy of Shannon’s information entropy was introduced in [10], [11] to find the relative reduct of a decision table. Based on the complement entropy, its conditional entropy was used to measure the importance of attributes in [12]. Qian and Liang [13] presented combination entropy to measure the uncertainty of information systems and used its conditional entropy to obtain an attribute subset. Sun et al. [14] proposed a new rough entropy associated to partitions, and presented the importance measure of attributes on the basis of this entropy. Yao et al. [15], [16], [17] studied several kinds of information entropy measures for attribute importance in rough set theory. However, the studies mentioned above are mainly focused on discrete-valued information system or decision systems.

Traditional rough set theory is suitable for discrete data rather than continues data since only partition or equivalence relation are considered. As pointed out by some researchers, partition or equivalence relation in traditional rough set theory is too restrictive for practical applications. Many practical data sets, in which some attributes are numerical, cannot be handled by partitions. To address this issue, some scholars extended the equivalence relations to similarity relation [18], tolerance relation [19], [20], and even general binary relations [21], [22]. An example extension is neighborhood-based rough set model. Lin [23] pointed out that neighborhood spaces are more general topological spaces than equivalence spaces, and introduced neighborhood relation into rough set theory. The properties of neighborhood approximation spaces were discussed in [24]. Hu et al. [25], [26] constructed a unified theoretical framework for neighborhood-based rough set model and a feature selection algorithm toward hybrid data. Liu et al. [27] proposed an efficient quick attribute reduction algorithm based on neighborhood rough set model. Another example extension of the traditional rough set model is covering-based rough set model [28], [29], [30], which is based on the relaxation of the partition to covering. The covering of a universe is used to define the upper and lower approximations of any subset of the universe. Most of the studies of covering-based rough set model mainly focus on the construction of the upper and lower approximations. In [31], [32], [33], [34], covering based approximation operators were discussed according to the element, granule, and subsystem based definitions. However, less effort has been put on the discussions of the construction of covering in numerical-valued information systems. In addition, there are only limited studies on the importance measure for covering decision table. In this paper, we address the issue of attributes importance measurement in covering decision table and propose effective knowledge change-based importance measure for covering decision table. The similarity block induced by attribute is given, under which coverings of the universe with numerical attributes are constructed. Further, the covering approximation operators are presented. In addition, the traditional importance measure is extended to deal with covering decision table, and the knowledge change-based importance measure with the structural characteristics of fuzzy measure is presented. Experimental results demonstrate that the knowledge change-based importance measure is effective and suitable for evaluating the importance of attributes in covering decision table. Experimental results also indicate that the knowledge change-based importance measure outperforms the traditional importance measure in covering decision table. The main contributions of the work are twofold. First, we give the concrete construction strategy of coverings to deal with data with numerical attributes or mixed attributes; second, a knowledge change-based importance measure, which meets the conditions of fuzzy measure, is presented in covering decision table.

The rest of the paper is organized as follows. Section 2 reviews some relative concepts in rough sets. In Section 3, we introduce the concept of similarity block in attribute space to construct coverings of the universe. Further, the approximation operators in covering approximation space are given. Besides, we present the extended importance measure and knowledge change-based importance measure in covering decision table. In Section 4, simulation experiments are carried out to evaluate the validity of the proposed method. The evaluation model with respect to knowledge change-based importance measure is constructed in Sections 5 and 6 shows a case study on students’ overall quality evaluation. Section 7 concludes this paper with some remarks and discussions.

Section snippets

Preliminaries

In this section, we briefly review some basic concepts related to rough sets and covering.

An information system in rough sets is a quadruple (U, A, v, f), where U is a non-empty finite set of objects, A is a non-empty finite set of attributes, v is the value set of attribute and f is the mapping function: f: U × R  v. In particular, (U, A  {d}, v, f) is called a decision table, where A denotes the condition attributes and d denotes the decision attribute.

In Pawlak’s rough set theory [1], the objects with

Importance measurement approaches in covering decision table

Traditional importance measure of attributes is mainly suitable for discrete-value information system. In the following, the similarity block is introduced to deal with the numerical attribute in the system and two kinds of importance measures in the covering decision table are given.

Experiments

In order to demonstrate the advantages of the proposed importance measure in this paper, some experiments are conducted on three public data sets, which are Iris, Liver disorders and Heart disease. The data sets are downloaded from the machine learning data repository, University of California at Irvine [37]. The experimental data are randomly selected from the three data sets. Table 4 shows the experimental data of Iris with thirty samples and four numerical condition attributes. Table 5 shows

Evaluation model with respect to knowledge change-based importance measure

Besides feature selection and attribute reduction, the importance measure also plays a significant part in comprehensive evaluation and multi-attribute decision making. To construct the evaluation model, another factor needed consideration is the selection of the aggregation operators. The ordered weighted averaging (OWA) operator introduced by Yager is a parameterized family of aggregation operators which have been used in many applications. But an important issue in the application of OWA

An illustration of proposed method

In this section, a case study on students’ overall quality evaluation is conducted by using the MIM. We randomly select some evaluation data from a college’s website (www.cdsygz.com/zongheceping/putongrenyuan/login.asp), as presented in Table 10. These data show the overall quality evaluation results of a class majoring in tourism during 2012–2013. The attributes used in the overall quality evaluation include morality (a1), academic record (a2) and physical quality (a3), and development quality

Conclusions

In this paper, we have studied the measurement of attribute importance about covering decision table. The original importance measure in traditional rough sets is extended to deal with covering decision table. In addition, we propose the knowledge change-based importance measure, which reflects the certain knowledge as well as the relevant knowledge in covering decision table. Theoretical analysis show that the importance of attributes monotonically increases with the attribute subset. The

Acknowledgements

This work is supported by National Natural Science Foundation of China (71371064) and the Natural Science Foundation of Hebei Province (F2015208100, F2015208099).

References (38)

Cited by (3)

  • Attribute importance measurement method based on data coordination degree

    2020, Knowledge-Based Systems
    Citation Excerpt :

    By combining D–S evidence theory with rough set, Du and Hu [23] proposed the inner (outer) attribute importance of coordinated (uncoordinated) ordered decision table. Taking the division of data sets as the carrier of knowledge, Li and Jin et al. [24–26] discussed the correlation features between the upper (lower) approximation of the decision class, the positive domain and the knowledge hidden in the system, and proposed several attribute importance measurement methods based on knowledge change rate satisfying the characteristics of the fuzzy measure structure. Jing and Li et al. [27] proposed the attribute importance measurement method of dynamic decision system from the perspective of knowledge granularity.

  • Decision table in Rough Set as a new chemometric approach for synthesis optimization: Mn-doped ZnS quantum dots as the example

    2018, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    Hence, new methods that could simplify the synthesis procedure (and focus on vital conditions) but still produce best product are welcomed. Rough Set, proposed by Zdzistaw Pawlak, is a mathematical theory to deal with uncertainly and incomplete information [16–19]. As an analytical method, decision table (DT) in Rough Set has been applied in artificial intelligence, decision analysis, data mining and health care [20–27].

  • Knowledge change rate-based attribute importance measure and its performance analysis

    2017, Knowledge-Based Systems
    Citation Excerpt :

    Attribute importance can supply new perspectives for analyzing data and help decision-makers determine the essential meaning of data sets. Therefore, attribute importance measures have been receiving attention from researchers [9,15–21]. Due to correlation problems among attributes in most real applications, some scholars used fuzzy measure [22] to describe an attribute importance, whose rationality and feasibility had been verified [23,24].

View full text