Compacted decision tables based attribute reduction
Introduction
In many practical applications, the dimensions of data sets (the number of attributes) are becoming higher and higher [1], [8], [24], [31]. For these high-dimensional data, attributes irrelevant to recognition tasks may deteriorate the performance of learning algorithms, and result in the high computing cost [11], [33]. Therefore, feature selection has become an important preprocessing step in pattern recognition, data mining and machine learning [9], [36].
Among existing feature selection algorithms, supervised feature selection algorithms are commonly employed to process the data with class labels, in which there are some representatives, such as feature selection algorithm with feature selection algorithm based on mRMR [32], sparsity-inducing norms [14], feature selection algorithm based on t-test [44], [45], feature subset selection algorithm with ordinal optimization [5] and feature selection algorithm based on neighborhood multi-granulation fusion [25]. For the investigation of feature selection, one of critical issues is how to select feature subset, and filters, wrappers and embedded methods have been generally recognized as the most popular methods to solve the issue [2], [8]. In filters methods [16], [17], the selection of feature subsets has nothing to do with the chosen learning machine. In wrappers methods [18], the selection of feature subsets depends on the learning machine that scores subsets of feature according to their predictive power. In embedded methods [8], [18], the selection of feature subsets, which is a part of training process, is embedded in learning machines. General speaking, filters and embedded methods are more efficient than wrappers methods, and the wrappers and embedded methods are more effective than filters methods [8].
Attribute reduction is an important research area in rough set theory [4], [28], [29], [30]. From the perspective of feature selection, attribute reduction is a specific kind of supervised feature selection method which adopts filters method. In recent years, researchers have introduced a lot of attribute reduction algorithms. Skowron and Rauszer [40], based on discernibility matrix, proposed an attribute reduction algorithm, by which all reducts can be obtained. Hu and Cercone [10] introduced discernibility matrix into decision tables. Ye and Chen [57] found out that only the reducts for a consistent decision table can be obtained by the method in [10], and proposed a modified discernibility matrix that is suitable for an inconsistent decision table. Yang [54], through considering the discernibility information in the consistent and inconsistent parts of a decision table respectively, proposed another decision-relative discernibility matrix, by which the time of computing reducts is significantly reduced. Wei et al. [51] proposed two discernibility matrices in the sense of Shannon entropy and complement entropy, which efficiently expands the application range of attribute reduction methods based on discernibility matrix. However, the problem of finding all reducts via using these discernibility matrices has been proved to be NP-hard [52], [56].
To solve the above problem, researchers introduced heuristic search strategy into the algorithms of finding reducts, which remarkably lessens their computational burden. Hu and Cercone [10] proposed a heuristic attribute reduction algorithm, in which the positive region is utilized to evaluate attribute significance and stop criterion. Slezak [38], [39] first introduced an attribute reduction algorithm in the sense of Shannon entropy. Wang et al. [46], [47] further improved the kind of algorithms in the sense of shannon entropy. Sequently, Liang et al. [20], [21], [22], [49], through introducing complement entropy to assess attribute significance and stop criterion, defined a new type of attribute reduction algorithms: the one based on complement entropy. To deal with hybrid data with numerical and categorical attributes, the attribute reduction algorithms based on fuzzy rough set and rough fuzzy set were proposed in [3], [12], [13], [37], [41], [50]. Additionally, plenty of attribute reduction methods were introduced to process incomplete data [26], [27]. Yao and Zhao proposed the attribute reduction methods in decision-theoretic rough sets [55], which can achieve the objective of minimize the cost of decisions [15]. Although these heuristic algorithms have speed up the process of finding reducts, the attribute reduction algorithms are still inefficient to deal with large data.
To further improve heuristic attribute reduction algorithms, Qian et al. [34] proposed a acceleration mechanism, in which the useless objects for finding reducts is progressively deleted in each iteration. The similar idea in [34] was developed to deal with incomplete data sets and hybrid data sets [35], [48]. However, in [34], [35], [48], only the useless objects are gradually deleted from data sets. In fact, the number of attributes also largely affects the efficiency of attribute reduction algorithm. Based on this consideration, Liang et al. [23] developed a more effective attribute reduction algorithm, in which both the useless objects and the irrelevant attributes are progressively removed from data sets in the process of finding reducts.
However, all the objects in one equivalence class are dealt with one by one when running these algorithms mentioned above, though they have the same value on each condition attribute. Thus, it is obvious that the duplicated counting results in the unnecessary time-consuming. To address this issue, some researchers introduced several homomorphisms of an information system, by which a massive information system can be compacted into a relatively small-scale information system and all its reducts are unchanged under the condition of homomorphism [6], [7], [19], [42], [43]. Furthermore, to remove the redundancy of a decision table, Xu et al. [53] proposed the simplified decision table, in which all the objects in a condition equivalence class are represented by one of objects in the equivalence class. Thus, the attribute reduction algorithms based on the simplifying decision table become more efficient than the previous ones. But, it is worth noticing that for the objects in one condition equivalence class, their values on the decision attribute are possibly different. In other words, the simplification of a decision table in [53] could make a loss of the values on decision attributes. It is precisely the fault of the simplified decision table that motivates us to seek a new method which cannot only eliminate the repetition of condition attribute values, but also preserve all the information on decision attributes.
Based on the analysis mentioned above, in this paper, we first point out that the reducts obtained from a simplified decision table are different from those obtained from its original version. Then, we propose the compacted decision table, and demonstrate that the sequence preserving of inner significance and outer significance in the sense of positive region after a decision table is compacted. And then, we indicate that from a simplified decision table, the reducts in the senses of Shannon entropy and complement entropy cannot be acquired, and demonstrate that they are able to obtained from a compacted decision table. Sequently, we design three algorithms based on the proposed compacted decision table to find the reducts in the sense of positive region, Shannon entropy and complement entropy. Finally, several numerical experiments are carried out to verify that our proposed algorithms are more efficient than the existing algorithms.
The remainder of the paper is organized as follows. In Section 2, some preliminaries about the rough set theory and attribute reduction algorithms are reviewed. In Section 3, we point out the fault of the simplified decision table, propose the compacted decision table, demonstrate sequence preserving of inner significance and outer significance in the sense of positive region, and design a new positive region attribute reduction algorithm. In Section 4, based on the proposed compacted decision table, we demonstrate the sequence preserving of inner significance and outer significance in the sense of Shannon entropy and complement entropy, and give the corresponding attribute reduction algorithms. In Section 5, several numerical experiments are carried out to indicate the effectiveness and efficiency of the proposed algorithms. Section 6 concludes the paper with some remarks.
Section snippets
Rough set
An information system (also known as a data table, an attribute–value system, a knowledge representation system) is a 4-tuple (for short ), where U is a non-empty and finite set of objects, called a universe, and A is a non-empty and finite set of attributes, is the domain of the attribute and is a function for each [28].
Each attribute subset derives an indiscernibility relation in the following way: ,
Simplified decision tables and compacted decision tables
In this section, we first point out that the sequence of attribute significance in a simplified decision table is inconsistent with that in its original version by means of a concrete example. To solve the issue, we propose a kind of new decision table: the compacted decision table. It preserves all the information that its corresponding original decision table has. We further demonstrate that the sequence of attribute significance can be remain after compacting a decision table. Finally, we
Shannon entropy and complement entropy attribute reduction based on compacted decision tables
From the analysis in the above section, we can see that the simplified decision table discards some decision values of objects, which results in not being able to compute the items and in the expression of entropies. Therefore, Shannon entropy and complement entropy cannot be computed by means of a simplified decision table. To solve this problem, in this section, we propose Shannon condition entropy and complement condition entropy for a compacted decision table and design the
Experimental analysis
To verify the theoretical results mentioned above, in this section, we carry out several comparative experiments between ACC-PR, AR-ST-PR and AR-CT-PR, between ACC-SCE and AR-CT-SCE, and between ACC-CCE and AR-CT-CCE. The hardware used in these experiments is a personal computer equipped with Intel Core i3 and 2 GB Memory, and the operation system and software are Windows 7 and C#, respectively. Twelve data sets in UCI repository of machine learning databases are employed in experiments and
Conclusion
In this paper, we first pointed out that the attribute reduction algorithm for a simplified decision table has two key faults as follows: (1) The reducts obtained from a simplified decision table are different with the ones obtained from its original version; (2) The reducts in the sense of shannon entropy and complement entropy cannot be obtained from a simplified decision table. We further found out that the reason that results in these two faults is essentially the lose of the values on
Acknowledgements
The research was supported by the National Natural Science Foundation of China (Nos. 61303008, 61202018, 61432011, and U1435212), the National Key Basic Research and Development Program of China (973) (No. 2013CB329404), and the Natural Science Foundation of Shanxi Province, China (No. 2013021018-1), Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi, China (No. 2013102).
References (57)
- et al.
Selection of relevant features and examples in machine learning
AI
(1997) - et al.
Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets
Inf. Sci.
(2011) - et al.
A reduct derived from feature selection
Pattern Recogn. Lett.
(2012) - et al.
Supervised feature subset selection with ordinal optimization
Knowl.-Based Syst.
(2014) - et al.
Subspace based feature selection for pattern recognition
Inf. Sci.
(2008) - et al.
Information-preserving hybrid data reduction based on fuzzy-rough techniques
Pattern Recogn. Lett.
(2006) - et al.
Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation
Pattern Recogn.
(2007) - et al.
Mixed feature selection based on granulation and approximation
Knowl.-Based Syst.
(2008) - et al.
Minimum cost attribute reduction in decision-theoretic rough set models
Inf. Sci.
(2013) - et al.
Wrappers for feature subset selection
Artif. Intell.
(1997)
A new measure of uncertainty based on knowledge granulation for rough sets
Inf. Sci.
An accelerator for attribute reduction based on perspective of objects and attributes
Knowl.-Based Syst.
Novel feature selection methods to financial distress prediction
Expert Syst. Appl.
Feature selection via neighborhood multi-granulation fusion
Knowl.-Based Syst.
A fast feature selection approach based on rough set boundary regions
Pattern Recogn. Lett.
A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets
Inf. Sci.
Rudiments of rough sets
Inf. Sci.
Rough sets and boolean reasoning
Inf. Sci.
Feature analysis through information granulation and fuzzy sets
Pattern Recogn.
Measures for evaluating the decision performance of a decision table in rough set theory
Inf. Sci.
Positive approximation: an accelerator for attribute reduction in rough set theory
Artif. Intell.
An efficient accelerator for attribute reduction from incomplete data in rough set framework
Pattern Recogn.
Framework for efficient feature selection in genetic algorithm based data mining
Eur. J. Oper. Res.
Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring
Pattern Recogn.
Data compression with homomorphism in covering information systems
Int. J. Approx. Reason.
Fuzzy information systems and their homomorphisms
Fuzzy Sets Syst.
t-Test feature selection approach based on term frequency for text categorization
Pattern Recogn. Lett.
A comparative study of rough sets for hybrid data
Inf. Sci.
Cited by (41)
Fast attribute reduction via inconsistent equivalence classes for large-scale data
2023, International Journal of Approximate ReasoningA fast neighborhood classifier based on hash bucket with application to medical diagnosis
2022, International Journal of Approximate ReasoningAccelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes
2020, Pattern RecognitionCitation Excerpt :Hence, the attribute reduct obtained by these entropies is often longer than that induced by attribute dependency reduction, which makes it difficult to deal with large-scale data sets with high dimensions. Although many methods have been proposed to reduce the computational burden and effectively accelerate dependency-based [9-11], information-entropy-based [12,13,24,35], and new-significance-measure-based [29] feature selection using rough set theory [12] or other theories such as Markov blankets [36,37], the computation times of these algorithms remain unacceptably long, especially when processing large-scale datasets with high dimensions. The main reason for this drawback is that the information-entropy-based significance of an attribute set requires repeated calculations, and during the process of selecting features, the information-entropy-based significance becomes a bottleneck and is inefficient for large-scale datasets.
NEC: A nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory
2020, Information SciencesCitation Excerpt :However, when a dataset cannot be compacted, this strategy has no effect on time reduction, or it requires even more computational time. Simultaneously, the method for dependency calculation based on a compacted table [8] is complicated. Improving the significance calculation is also effective for accelerating feature selection, particularly for swarm intelligence-based methods.
Attribute group for attribute reduction
2020, Information Sciences