A novel condensing tree structure for rough set feature selection

doi:10.1016/j.neucom.2007.09.003

Neurocomputing

Volume 71, Issues 4–6, January 2008, Pages 1092-1100

https://doi.org/10.1016/j.neucom.2007.09.003 Get rights and content

Abstract

Rough set approach is one of effective feature selection methods that can preserve the meaning of the features. The essence of feature selection based on rough set approach is to find a subset of the original features (attributes) using rough set theory. So far, many feature selection (also called feature reduction) methods based on rough set have been proposed, where numerous experimental results have demonstrated that these methods based on discernibility matrix are concise and efficient, but have much higher space complexity. In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, in this paper, we introduce a novel condensing tree structure (C-Tree), which is an extended order-tree, every non-empty element of a discernibility matrix is mapped to one path in the C-Tree and a lot of non-empty elements may share the same path or prefix, so the C-Tree has much lower space complexity as compared to discernibility matrix. Moreover, our feature selection algorithms employ the C-Tree structure and incorporate some heuristic strategies, hence efficiently reduce both space and computational complexities. Algorithms of this paper are experimented using some standard datasets and synthetic datasets for testing both time and space complexities. Experimental results show that the algorithms of this paper can efficiently reduce the cost of storage and be computationally inexpensive when compared to the existing algorithms based on discernibility matrix for feature reduction.

Introduction

Feature selection (also called feature reduction or attribute reduction) can be viewed as one of the most fundamental problems in the field of the machine learning. It is defined as a process of selecting relevant features out of the larger set of candidate features. The relevant features are defined as features that describe the target task. In data mining field, feature selection techniques become increasingly essential for reducing the cost of computation and storage and for improving the accuracy of the prediction [12]. As Liu pointed out in [12], the motivation of feature selection in data mining and machine learning is to: reduce the dimensionality of feature space, speed up and reduce the cost of a learning algorithm, improve the predictive accuracy of a classification algorithm, and to improve the visualization and the comprehensibility of the induced concepts. Especially, the authors of [12] have emphasized that not every feature selection method can serve all purposes.

Rough set theory (RST) was proposed by Pawlak [13], which is a valid mathematical theory to deal with imprecise, uncertain, and vague information. It has been widely applied in many fields such as machine learning [18], data mining [12], intelligent data analyzing and control algorithm acquiring [1], [19], etc. RST provides two fundamental concepts to deal with this special problem: feature selection and core. There are a lot of methods available for core and feature selection based on rough set [2], [3], [4], [6], [7], [8], [9], [10], [14], [16], [17], [20], [22], [24]. Among them, methods based on discernibility matrix are of considerable benefits [2], [3], [4], [6], [8], [9], [10], [16], [17], [20], [22], [24], because each entry m_i_,_j of the matrix corresponding to objects x_i and x_j includes the conditional features in which the two objects’ values differ. In particular, in a discernibility matrix, the feature corresponding to the entry that only consists of one feature must belong to a core (the intersection of all sets of feature reduction), hence a core can be quickly acquired using the discernibility matrix structure [22], [24]. At the same time, some efficient feature selection algorithms have been designed using the discernibility matrix structure and incorporating various optimization techniques such as feature-order strategy [10], heuristic search methods [2], [11], [20], and marginal relative dependency APPROACH [4] and others [9], [17]. Hence, these algorithms can efficiently reduce the computational complexity. In essence, most algorithms in [2], [3], [4], [6], [8], [10], [16], [20], [22], [24] are Johnson-reduct's variants, some algorithms have been applied to textual case-based classification systems [4] and other classifier design [3], [14] such that classification accuracy is improved and computational complexity is reduced, e.g., Johnson's reduct algorithm in [4] is an order of magnitude faster than information gain and yet provided comparable classification performance; the feature selection approach of [14] also appears to work well for rough set classifier. These feature selection algorithms have high cost of storage for a very large decision table, however, because the space complexity of a discernibility matrix is O(m*n²) for a decision table with n objects and m conditional features. Although the space complexity can be reduced to some extent by using random training set partitions strategy [4], every discernibility matrix corresponding to each partition also has high space complexity when the partition holds large objects.

So, the main motivation of this study is to design a novel structure that can efficiently store all non-empty elements in a discernibility matrix and also reduce the cost of storage space.

In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, in this paper, we introduce a novel condensing tree structure (C-Tree), which is an extended order-tree, every non-empty entry of a discernibility matrix is mapped to one path in the C-Tree and a lot of non-empty entries may share the same path or prefix, so the C-Tree has much lower space complexity as compared to the discernibility matrix. Moreover, our feature selection algorithms employ the C-Tree structure and incorporate some heuristic strategies, hence efficiently reduce both space and computational complexities. Algorithms of this paper are experimented using some standard datasets and synthetic datasets for testing both time and space complexities. Experimental results show that the algorithms of this paper can efficiently reduce cost of storage and be computationally inexpensive when compared to the existing algorithms based on discernibility matrix for feature reduction.

The rest of this paper is organized as follows. In Section 2, some basic concepts on RST are briefly introduced. In Section 3, the C-Tree and its generation approach are introduced. Two C-Tree-based rough set feature selection algorithms are developed in Section 4. Some experimental comparisons are presented in Section 5. Finally, Section 6 gives our conclusions and several issues for future works.

Section snippets

Preliminary concepts of RST

This section only introduce some essential definitions from RST that are used for feature reduction, more details and formal definitions about the RST can be found in [15], [19].

In RST, a dataset can be formally described using an information system [15], [19] (also called a decision table in this paper). An information system is denoted as IS=〈U, Q, V, f〉, where U={x₁, x₂,…,x_n} is a non-empty finite set of objects or cases, called universe, Q is a non-empty finite set of features, Q=C∪D, where

Condensing tree (C-Tree)

As mentioned above, the space complexity of a discernibility matrix is O(m*n²) for a decision table with n objects and m conditional features. Clearly, if the number of conditional features m is fixed, the existing algorithms based on discernibility matrix become infeasible when the number n of objects is very large.

For a given discernibility matrix DM, in most cases, we find that entries of DM hold a lot of differently common feature subset, that is, some feature subsets may appear in many

A heuristic feature selection algorithm based on C-Tree

According to the heuristic strategy of JohnsonsReduct algorithm, by using C-Tree structure, a new feature selection algorithm is as follows. It sequentially selects features by searching for those that are most discernible for a given decision feature.

Algorithm 2

JreductBtree(C, D, U)

Input C: conditional features, D: decision feature, U: objects

Output R: feature reduction R⊆C

1.
R←φ, A←C
2.
T←GeneratingC-Tree(C, D, U)
3.
do
4.
a←selectHighestFrequencyfeature(T)
5.
R←R∪{a}
6.
i←locate(a, HT) //the position of feature a in header

Experimental results

In order to compare the feature reduction algorithms given in this paper with JohnsonsReduct algorithm based on the discernibility matrix (see, e.g. [4], [20]), we perform the experiments on six publicly available datasets from UCI database (These datasets can be downloaded at http://www.ics.uci.edu) and five synthetic datasets. A brief description for the UCI datasets is given at first: (1) Mushroom: 22-condition features (22C), 1-decision feature (1D), 8124 objects. For short, (22C,1D,8124);

Conclusions and future work

In this paper we introduce a novel C-Tree, which is generally highly compact, many non-empty entries corresponding to a given discernibility matrix for an information system may share a common path or a common prefix of a path, hence in most cases the space complexity can be efficiently reduced. Based on the C-Tree structure, two heuristic search algorithms for feature reduction are presented, which can efficiently obtain the corresponding heuristic information because all discernible

Acknowledgments

This paper is partially supported by National Natural Science Foundation of P.R. China and Jiangsu under Grant nos. 40771163 and BK2005135, and National Science Research Foundation of Jiangsu Province under Grant no. 05KJB520066, respectively.

Yang Ming received his Ph.D. degree in the department of computer science and engineering from Southeast University at Nanjing in 2004. He received his M.S. degree in the department of mathematics from University of Science & Technology of China, and his B.S. degree in the department of mathematics from Anhui Normal University, in 1990 and 1987. He is currently a Professor in the department of computer science at Nanjing Normal University. His research interests include data mining and

References (24)

J.W. Guan et al.
Rough computational methods for information systems [J]
Artif. Intell.
(1998)
R.W. Swiniarski et al.
Rough set methods in feature selection and recognition
Pattern Recognition Lett.
(2003)
U.M. Fayyad, L.P. Shapiro, P. Smyth, R. Uthurusamy, in: Menlo Park (Ed.), Advances in Knowledge Discovery and Data...
K.M. Gupta, P.G. Moore, D.W. Aha, S.K. Pal, Rough set feature selection methods for case-based categorization of text...
K.M. Gupta, W.A. David, M. Philip, Rough set feature selection algorithms for textual case-based classification, in:...
J.W. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the ACM SIGMOD...
X.H. Hu et al.
Learning in relational databases: a rough set approach
Computat. Intell. Int. J.
(1995)
F. Hu, G.Y. Wang, H. Huang, Y. Wu, Incremental attribute reduction based on elementary sets, in: Proceedings of 10th...
R. Jensen et al.
Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches
IEEE Trans. Knowl. Data Eng.
(2004)
R. Jensen, Q. Shen, A. Tuson, Finding rough set reducts with SAT, in: Proceedings of the 10th International Conference...

W. Jue et al.

Reduction algorithm based on discernibility matrix the ordered features method

J. Comput. Sci. Technol.

(2001)

Juzhen Dong, Ning Zhong, Setsuo Ohsuga, Using rough sets with heuristics for feature selection, in: N. Zhong, A....

Cited by (44)

Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm
2019, Expert Systems with Applications
Citation Excerpt :
Considering misclassification cost, Shu and Shen defined a multi-criteria evaluation function to evaluate the significance of features by using the classification accuracy and the feature cost, and developed a forward greedy algorithm to select a good feature subset (Shu & Shen, 2016). Compared with the discernibility matrix-based feature selection algorithm (Yang & Yang, 2008), the minimal-redundancy-maximal-relevance method (mRMR) (Peng, Long, & Ding, 2005), and the consistency-based algorithm (Hu, Zhao, Xie, & Yu, 2007), the authors demonstrated the superior performance of their algorithm. Considering the trade-off between test cost and misclassification cost, Zhao et al. proposed a new neighborhood rough set model based on adaptive neighborhood granularity, and introduced a fast backtracking algorithm to solve the model (Zhao et al., 2016).
Since different features may require different costs, the cost-sensitive feature selection problem become more and more important in real-world applications. Generally, it includes two main conflicting objectives, i.e., maximizing the classification performance and minimizing the feature cost. However, most existing approaches treat this task as a single-objective optimization problem. To satisfy various requirements of decision-makers, this paper studies a multi-objective feature selection approach, called two-archive multi-objective artificial bee colony algorithm (TMABC-FS). Two new operators, i.e., convergence-guiding search for employed bees and diversity-guiding search for onlooker bees, are proposed for obtaining a group of non-dominated feature subsets with good distribution and convergence. And two archives, i.e., the leader archive and the external archive are employed to enhance the search capability of different kinds of bees. The proposed TMABC-FS is validated on several datasets from UCI, and is compared with two traditional algorithms and three multi-objective methods. Results have shown that TMABC-FS is an efficient and robust optimization method for solving cost-sensitive feature selection problems.
Multi-criteria feature selection on cost-sensitive data with missing values
2016, Pattern Recognition
Citation Excerpt :
In this paper, we will focus on the feature selection on cost-sensitive data with missing feature values for selecting a feature subset of a minimized total cost and the same descriptive power as the whole feature set. In rough set theory, many feature selection algorithms have been developed in the last decade [11,13,18,19,27,35–37]. Two key tasks in constructing a feature selection scheme are, formulation of a suitable criterion to evaluate the quality of candidate feature and searching the optimal feature subset in terms of the criterion [18,37].
Feature selection plays an important role in pattern recognition and machine learning. Confronted with high dimensional data in many data analysis tasks, feature selection techniques are designed to find a relevant feature subset of the original features which can facilitate classification. However, in many real-world applications, missing feature values that contribute to test and misclassification costs are emerging to be an issue of increasing concern for most data sets, particularly dealing with big data. The existing feature selection approaches do not address this issue effectively. In this paper, based on rough set theory we address the problem of feature selection for cost-sensitive data with missing values. We first propose a multi-criteria evaluation function to characterize the significance of candidate features, by taking into consideration not only the power in the positive region and boundary region but also their associated costs. On this basis, we develop a forward greedy feature selection algorithm for selecting a feature subset of minimized cost that preserves the same information as the whole feature set. In addition, to improve the efficiency of this algorithm, we implement the selection of candidate features in a dwindling object set. Finally, we demonstrate the superior performance of the proposed algorithm to the existing feature selection algorithms through experimental results on different data sets.
Mutual information criterion for feature selection from incomplete data
2015, Neurocomputing
Citation Excerpt :
Though the above discernibility matrix methods can find all the subsets of features in incomplete decision systems, they are usually time-consuming and intolerable to process large-scale incomplete data. To reduce the storage space of the existing discernibility matrix-based feature selection methods, Yang et al. [49] developed a feature selection algorithm based on a novel condensing tree structure (C-Tree). From the perspective of entropy, Sun et al. [3] proposed a rough entropy-based feature selection algorithm by using the uncertainty measure in incomplete decision systems.
Feature selection is an important preprocessing step in machine learning and data mining, and feature criterion arises a key issue in the construction of feature selection algorithms. Mutual information is one of the widely used criteria in feature selection, which determines the relevance between features and target classes. Some mutual information-based feature selection algorithms have been extensively studied, but less effort has been made to investigate the feature selection issue in incomplete data. In this paper, combined with the tolerance information granules in rough sets, the mutual information criterion is provided for evaluating candidate features in incomplete data, which not only utilizes the largest mutual information with the target class but also takes into consideration the redundancy between selected features. We first validate the feasibility of the mutual information. Then an effective mutual information-based feature selection algorithm with forward greedy strategy is developed in incomplete data. To further accelerate the feature selection process, the selection of candidate features is implemented in a dwindling object set. Compared with existing feature selection algorithms, the experimental results on different real data sets show that the proposed algorithm is more effective for feature selection in incomplete data at most cases.
Finding rough and fuzzy-rough set reducts with SAT
2014, Information Sciences
Feature selection refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition and signal processing. In particular, solution to this has found successful application in tasks that involve datasets containing huge numbers of features (in the order of tens of thousands), which would otherwise be impossible to process further. Recent examples include text processing and web content classification. Rough set theory has been used as such a dataset pre-processor with much success, but current methods are inadequate at finding globally minimal reductions, the smallest sets of features possible. This paper proposes a technique that considers this problem from a propositional satisfiability perspective. In this framework, globally minimal subsets can be located and verified.
Analyzing rough set based attribute reductions by extension rule
2014, Neurocomputing
An improved discernibility function for rough set based attribute reduction is defined to keep discernibility ability and remove redundant attributes without the precondition of the Positive Region. On the basis of discernibility function, the solution of rough set based attribute reduction can be found by satisfiability methods. With extension rule theory, a satisfiability method, the distribution of solutions with different number of attributes is obtained without enumerating all attribute reductions. Then, it is easy to search the attribute reduction with the smallest number of attributes. In addition, the cost of space and time is analyzed to find factors playing role in the computation of the method.
A novel approach to improving C-Tree for feature selection
2011, Applied Soft Computing Journal
Citation Excerpt :
By Algorithm 3, in most cases we can get a more compact C-Tree as compared to the C-Tree generated under the nature order of features, it is of considerable benefits for reducing the space complexity. Moreover, similar to Property 3.1 proposed in Ref. [17], we can also get the following useful property. For a given C-Tree, if R is a conditional feature subset that satisfies the following condition: the intersection of R and SP is nonempty for each path or sub-path P of the C-Tree, then POSR(D) = POSC(D) holds.
Rough set approach is one of effective feature selection methods that can preserve the meaning of the features. So far, many feature selection (also called feature reduction) methods based on Rough set have been proposed. Of which, methods based on discernibility matrix are of considerable benefits for their conciseness and effectiveness, but have much higher space complexity. In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, a novel condensing tree (C-Tree) structure was introduced, which is an extended order-tree, every nonempty element of a discernibility matrix is stored in one path in the C-Tree by given order of features and lots of nonempty elements share one path or sub-path, so the C-Tree has much lower space complexity as compared to discernibility matrix. However, the size of a C-Tree greatly depends on the order of features in most cases, hence how to set the proper order of features is of importance. To generate a higher compressed C-Tree, in this paper, after introducing an efficient trick for efficiently measuring the relative importance of every feature, we present a new feature ordering strategy according to the descending order of their importance. Further, based on the new feature ordering strategy, corresponding two heuristic algorithms for feature selection are introduced. Algorithms of this paper are experimented using six standard datasets and five synthetic datasets for testing both time and space complexities. Experimental results show that the newly improved feature selection algorithm can further efficiently reduce cost of storage in most cases.

View all citing articles on Scopus

Yang Ping received her B.S. degree in the department of mathematics from Anhui Normal University, in 1989. She is currently an Associate Professor in the department of mathematics at Nanjing Normal University. Her research interests include multiple criteria decision-making, rough sets theory and its applications.

View full text

LettersA novel condensing tree structure for rough set feature selection

Abstract

Introduction

Section snippets

Preliminary concepts of RST

Condensing tree (C-Tree)

A heuristic feature selection algorithm based on C-Tree

Experimental results

Conclusions and future work

Acknowledgments

Artif. Intell.

Pattern Recognition Lett.

Learning in relational databases: a rough set approach

Computat. Intell. Int. J.

Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches

IEEE Trans. Knowl. Data Eng.

Reduction algorithm based on discernibility matrix the ordered features method

J. Comput. Sci. Technol.

Letters
A novel condensing tree structure for rough set feature selection