Elsevier

Knowledge-Based Systems

Volume 86, September 2015, Pages 261-277
Knowledge-Based Systems

Compacted decision tables based attribute reduction

https://doi.org/10.1016/j.knosys.2015.06.013Get rights and content

Abstract

This paper first points out that the reducts obtained from a simplified decision table are different from those obtained from its original version, and from a simplified decision table, we cannot obtain the reducts in the sense of entropies. To solve these problems, we propose the compacted decision table that can preserve all the information coming from its original version. We theoretically demonstrate that the order preserving of attributes’ inner significance and outer significance in the sense of positive region and two types of entropies after a decision table is compacted, which ensures that the reducts obtained from a compacted decision are identical to those obtained from its original version. Finally, several numerical experiments indicate the effectiveness and efficiency of the attribute reduction algorithms for a compacted decision table.

Introduction

In many practical applications, the dimensions of data sets (the number of attributes) are becoming higher and higher [1], [8], [24], [31]. For these high-dimensional data, attributes irrelevant to recognition tasks may deteriorate the performance of learning algorithms, and result in the high computing cost [11], [33]. Therefore, feature selection has become an important preprocessing step in pattern recognition, data mining and machine learning [9], [36].

Among existing feature selection algorithms, supervised feature selection algorithms are commonly employed to process the data with class labels, in which there are some representatives, such as feature selection algorithm with feature selection algorithm based on mRMR [32], sparsity-inducing norms [14], feature selection algorithm based on t-test [44], [45], feature subset selection algorithm with ordinal optimization [5] and feature selection algorithm based on neighborhood multi-granulation fusion [25]. For the investigation of feature selection, one of critical issues is how to select feature subset, and filters, wrappers and embedded methods have been generally recognized as the most popular methods to solve the issue [2], [8]. In filters methods [16], [17], the selection of feature subsets has nothing to do with the chosen learning machine. In wrappers methods [18], the selection of feature subsets depends on the learning machine that scores subsets of feature according to their predictive power. In embedded methods [8], [18], the selection of feature subsets, which is a part of training process, is embedded in learning machines. General speaking, filters and embedded methods are more efficient than wrappers methods, and the wrappers and embedded methods are more effective than filters methods [8].

Attribute reduction is an important research area in rough set theory [4], [28], [29], [30]. From the perspective of feature selection, attribute reduction is a specific kind of supervised feature selection method which adopts filters method. In recent years, researchers have introduced a lot of attribute reduction algorithms. Skowron and Rauszer [40], based on discernibility matrix, proposed an attribute reduction algorithm, by which all reducts can be obtained. Hu and Cercone [10] introduced discernibility matrix into decision tables. Ye and Chen [57] found out that only the reducts for a consistent decision table can be obtained by the method in [10], and proposed a modified discernibility matrix that is suitable for an inconsistent decision table. Yang [54], through considering the discernibility information in the consistent and inconsistent parts of a decision table respectively, proposed another decision-relative discernibility matrix, by which the time of computing reducts is significantly reduced. Wei et al. [51] proposed two discernibility matrices in the sense of Shannon entropy and complement entropy, which efficiently expands the application range of attribute reduction methods based on discernibility matrix. However, the problem of finding all reducts via using these discernibility matrices has been proved to be NP-hard [52], [56].

To solve the above problem, researchers introduced heuristic search strategy into the algorithms of finding reducts, which remarkably lessens their computational burden. Hu and Cercone [10] proposed a heuristic attribute reduction algorithm, in which the positive region is utilized to evaluate attribute significance and stop criterion. Slezak [38], [39] first introduced an attribute reduction algorithm in the sense of Shannon entropy. Wang et al. [46], [47] further improved the kind of algorithms in the sense of shannon entropy. Sequently, Liang et al. [20], [21], [22], [49], through introducing complement entropy to assess attribute significance and stop criterion, defined a new type of attribute reduction algorithms: the one based on complement entropy. To deal with hybrid data with numerical and categorical attributes, the attribute reduction algorithms based on fuzzy rough set and rough fuzzy set were proposed in [3], [12], [13], [37], [41], [50]. Additionally, plenty of attribute reduction methods were introduced to process incomplete data [26], [27]. Yao and Zhao proposed the attribute reduction methods in decision-theoretic rough sets [55], which can achieve the objective of minimize the cost of decisions [15]. Although these heuristic algorithms have speed up the process of finding reducts, the attribute reduction algorithms are still inefficient to deal with large data.

To further improve heuristic attribute reduction algorithms, Qian et al. [34] proposed a acceleration mechanism, in which the useless objects for finding reducts is progressively deleted in each iteration. The similar idea in [34] was developed to deal with incomplete data sets and hybrid data sets [35], [48]. However, in [34], [35], [48], only the useless objects are gradually deleted from data sets. In fact, the number of attributes also largely affects the efficiency of attribute reduction algorithm. Based on this consideration, Liang et al. [23] developed a more effective attribute reduction algorithm, in which both the useless objects and the irrelevant attributes are progressively removed from data sets in the process of finding reducts.

However, all the objects in one equivalence class are dealt with one by one when running these algorithms mentioned above, though they have the same value on each condition attribute. Thus, it is obvious that the duplicated counting results in the unnecessary time-consuming. To address this issue, some researchers introduced several homomorphisms of an information system, by which a massive information system can be compacted into a relatively small-scale information system and all its reducts are unchanged under the condition of homomorphism [6], [7], [19], [42], [43]. Furthermore, to remove the redundancy of a decision table, Xu et al. [53] proposed the simplified decision table, in which all the objects in a condition equivalence class are represented by one of objects in the equivalence class. Thus, the attribute reduction algorithms based on the simplifying decision table become more efficient than the previous ones. But, it is worth noticing that for the objects in one condition equivalence class, their values on the decision attribute are possibly different. In other words, the simplification of a decision table in [53] could make a loss of the values on decision attributes. It is precisely the fault of the simplified decision table that motivates us to seek a new method which cannot only eliminate the repetition of condition attribute values, but also preserve all the information on decision attributes.

Based on the analysis mentioned above, in this paper, we first point out that the reducts obtained from a simplified decision table are different from those obtained from its original version. Then, we propose the compacted decision table, and demonstrate that the sequence preserving of inner significance and outer significance in the sense of positive region after a decision table is compacted. And then, we indicate that from a simplified decision table, the reducts in the senses of Shannon entropy and complement entropy cannot be acquired, and demonstrate that they are able to obtained from a compacted decision table. Sequently, we design three algorithms based on the proposed compacted decision table to find the reducts in the sense of positive region, Shannon entropy and complement entropy. Finally, several numerical experiments are carried out to verify that our proposed algorithms are more efficient than the existing algorithms.

The remainder of the paper is organized as follows. In Section 2, some preliminaries about the rough set theory and attribute reduction algorithms are reviewed. In Section 3, we point out the fault of the simplified decision table, propose the compacted decision table, demonstrate sequence preserving of inner significance and outer significance in the sense of positive region, and design a new positive region attribute reduction algorithm. In Section 4, based on the proposed compacted decision table, we demonstrate the sequence preserving of inner significance and outer significance in the sense of Shannon entropy and complement entropy, and give the corresponding attribute reduction algorithms. In Section 5, several numerical experiments are carried out to indicate the effectiveness and efficiency of the proposed algorithms. Section 6 concludes the paper with some remarks.

Section snippets

Rough set

An information system (also known as a data table, an attribute–value system, a knowledge representation system) is a 4-tuple S=(U,A,V,f) (for short S=(U,A)), where U is a non-empty and finite set of objects, called a universe, and A is a non-empty and finite set of attributes, Va is the domain of the attribute a,V=aAVa and f:U×A=V is a function f(x,a)Va for each aA [28].

Each attribute subset BA derives an indiscernibility relation in the following way: RB={(x,y)U×U|f(x,a)=f(y,a),aB},

Simplified decision tables and compacted decision tables

In this section, we first point out that the sequence of attribute significance in a simplified decision table is inconsistent with that in its original version by means of a concrete example. To solve the issue, we propose a kind of new decision table: the compacted decision table. It preserves all the information that its corresponding original decision table has. We further demonstrate that the sequence of attribute significance can be remain after compacting a decision table. Finally, we

Shannon entropy and complement entropy attribute reduction based on compacted decision tables

From the analysis in the above section, we can see that the simplified decision table discards some decision values of objects, which results in not being able to compute the items |Xi| and |Xi||Yj| in the expression of entropies. Therefore, Shannon entropy and complement entropy cannot be computed by means of a simplified decision table. To solve this problem, in this section, we propose Shannon condition entropy and complement condition entropy for a compacted decision table and design the

Experimental analysis

To verify the theoretical results mentioned above, in this section, we carry out several comparative experiments between ACC-PR, AR-ST-PR and AR-CT-PR, between ACC-SCE and AR-CT-SCE, and between ACC-CCE and AR-CT-CCE. The hardware used in these experiments is a personal computer equipped with Intel Core i3 and 2 GB Memory, and the operation system and software are Windows 7 and C#, respectively. Twelve data sets in UCI repository of machine learning databases are employed in experiments and

Conclusion

In this paper, we first pointed out that the attribute reduction algorithm for a simplified decision table has two key faults as follows: (1) The reducts obtained from a simplified decision table are different with the ones obtained from its original version; (2) The reducts in the sense of shannon entropy and complement entropy cannot be obtained from a simplified decision table. We further found out that the reason that results in these two faults is essentially the lose of the values on

Acknowledgements

The research was supported by the National Natural Science Foundation of China (Nos. 61303008, 61202018, 61432011, and U1435212), the National Key Basic Research and Development Program of China (973) (No. 2013CB329404), and the Natural Science Foundation of Shanxi Province, China (No. 2013021018-1), Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi, China (No. 2013102).

References (57)

  • J.Y. Liang et al.

    A new measure of uncertainty based on knowledge granulation for rough sets

    Inf. Sci.

    (2009)
  • J.Y. Liang et al.

    An accelerator for attribute reduction based on perspective of objects and attributes

    Knowl.-Based Syst.

    (2013)
  • F.Y. Lin et al.

    Novel feature selection methods to financial distress prediction

    Expert Syst. Appl.

    (2014)
  • Y.J. Lin et al.

    Feature selection via neighborhood multi-granulation fusion

    Knowl.-Based Syst.

    (2014)
  • Z.C. Lu et al.

    A fast feature selection approach based on rough set boundary regions

    Pattern Recogn. Lett.

    (2014)
  • Z.Q. Meng et al.

    A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets

    Inf. Sci.

    (2009)
  • Z. Pawlak et al.

    Rudiments of rough sets

    Inf. Sci.

    (2007)
  • Z. Pawlak et al.

    Rough sets and boolean reasoning

    Inf. Sci.

    (2007)
  • W. Pedrycz et al.

    Feature analysis through information granulation and fuzzy sets

    Pattern Recogn.

    (2002)
  • Y.H. Qian et al.

    Measures for evaluating the decision performance of a decision table in rough set theory

    Inf. Sci.

    (2008)
  • Y.H. Qian et al.

    Positive approximation: an accelerator for attribute reduction in rough set theory

    Artif. Intell.

    (2010)
  • Y.H. Qian et al.

    An efficient accelerator for attribute reduction from incomplete data in rough set framework

    Pattern Recogn.

    (2011)
  • R. Sikora et al.

    Framework for efficient feature selection in genetic algorithm based data mining

    Eur. J. Oper. Res.

    (2007)
  • Q. Shen et al.

    Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring

    Pattern Recogn.

    (2004)
  • C.Z. Wang et al.

    Data compression with homomorphism in covering information systems

    Int. J. Approx. Reason.

    (2011)
  • C.Z. Wang et al.

    Fuzzy information systems and their homomorphisms

    Fuzzy Sets Syst.

    (2014)
  • D.Q. Wang et al.

    t-Test feature selection approach based on term frequency for text categorization

    Pattern Recogn. Lett.

    (2014)
  • W. Wei et al.

    A comparative study of rough sets for hybrid data

    Inf. Sci.

    (2012)
  • Cited by (41)

    • Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

      2020, Pattern Recognition
      Citation Excerpt :

      Hence, the attribute reduct obtained by these entropies is often longer than that induced by attribute dependency reduction, which makes it difficult to deal with large-scale data sets with high dimensions. Although many methods have been proposed to reduce the computational burden and effectively accelerate dependency-based [9-11], information-entropy-based [12,13,24,35], and new-significance-measure-based [29] feature selection using rough set theory [12] or other theories such as Markov blankets [36,37], the computation times of these algorithms remain unacceptably long, especially when processing large-scale datasets with high dimensions. The main reason for this drawback is that the information-entropy-based significance of an attribute set requires repeated calculations, and during the process of selecting features, the information-entropy-based significance becomes a bottleneck and is inefficient for large-scale datasets.

    • NEC: A nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory

      2020, Information Sciences
      Citation Excerpt :

      However, when a dataset cannot be compacted, this strategy has no effect on time reduction, or it requires even more computational time. Simultaneously, the method for dependency calculation based on a compacted table [8] is complicated. Improving the significance calculation is also effective for accelerating feature selection, particularly for swarm intelligence-based methods.

    • Attribute group for attribute reduction

      2020, Information Sciences
    View all citing articles on Scopus
    View full text