Elsevier

Neurocomputing

Volume 71, Issues 4–6, January 2008, Pages 1092-1100
Neurocomputing

Letters
A novel condensing tree structure for rough set feature selection

https://doi.org/10.1016/j.neucom.2007.09.003Get rights and content

Abstract

Rough set approach is one of effective feature selection methods that can preserve the meaning of the features. The essence of feature selection based on rough set approach is to find a subset of the original features (attributes) using rough set theory. So far, many feature selection (also called feature reduction) methods based on rough set have been proposed, where numerous experimental results have demonstrated that these methods based on discernibility matrix are concise and efficient, but have much higher space complexity. In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, in this paper, we introduce a novel condensing tree structure (C-Tree), which is an extended order-tree, every non-empty element of a discernibility matrix is mapped to one path in the C-Tree and a lot of non-empty elements may share the same path or prefix, so the C-Tree has much lower space complexity as compared to discernibility matrix. Moreover, our feature selection algorithms employ the C-Tree structure and incorporate some heuristic strategies, hence efficiently reduce both space and computational complexities. Algorithms of this paper are experimented using some standard datasets and synthetic datasets for testing both time and space complexities. Experimental results show that the algorithms of this paper can efficiently reduce the cost of storage and be computationally inexpensive when compared to the existing algorithms based on discernibility matrix for feature reduction.

Introduction

Feature selection (also called feature reduction or attribute reduction) can be viewed as one of the most fundamental problems in the field of the machine learning. It is defined as a process of selecting relevant features out of the larger set of candidate features. The relevant features are defined as features that describe the target task. In data mining field, feature selection techniques become increasingly essential for reducing the cost of computation and storage and for improving the accuracy of the prediction [12]. As Liu pointed out in [12], the motivation of feature selection in data mining and machine learning is to: reduce the dimensionality of feature space, speed up and reduce the cost of a learning algorithm, improve the predictive accuracy of a classification algorithm, and to improve the visualization and the comprehensibility of the induced concepts. Especially, the authors of [12] have emphasized that not every feature selection method can serve all purposes.

Rough set theory (RST) was proposed by Pawlak [13], which is a valid mathematical theory to deal with imprecise, uncertain, and vague information. It has been widely applied in many fields such as machine learning [18], data mining [12], intelligent data analyzing and control algorithm acquiring [1], [19], etc. RST provides two fundamental concepts to deal with this special problem: feature selection and core. There are a lot of methods available for core and feature selection based on rough set [2], [3], [4], [6], [7], [8], [9], [10], [14], [16], [17], [20], [22], [24]. Among them, methods based on discernibility matrix are of considerable benefits [2], [3], [4], [6], [8], [9], [10], [16], [17], [20], [22], [24], because each entry mi,j of the matrix corresponding to objects xi and xj includes the conditional features in which the two objects’ values differ. In particular, in a discernibility matrix, the feature corresponding to the entry that only consists of one feature must belong to a core (the intersection of all sets of feature reduction), hence a core can be quickly acquired using the discernibility matrix structure [22], [24]. At the same time, some efficient feature selection algorithms have been designed using the discernibility matrix structure and incorporating various optimization techniques such as feature-order strategy [10], heuristic search methods [2], [11], [20], and marginal relative dependency APPROACH [4] and others [9], [17]. Hence, these algorithms can efficiently reduce the computational complexity. In essence, most algorithms in [2], [3], [4], [6], [8], [10], [16], [20], [22], [24] are Johnson-reduct's variants, some algorithms have been applied to textual case-based classification systems [4] and other classifier design [3], [14] such that classification accuracy is improved and computational complexity is reduced, e.g., Johnson's reduct algorithm in [4] is an order of magnitude faster than information gain and yet provided comparable classification performance; the feature selection approach of [14] also appears to work well for rough set classifier. These feature selection algorithms have high cost of storage for a very large decision table, however, because the space complexity of a discernibility matrix is O(m*n2) for a decision table with n objects and m conditional features. Although the space complexity can be reduced to some extent by using random training set partitions strategy [4], every discernibility matrix corresponding to each partition also has high space complexity when the partition holds large objects.

So, the main motivation of this study is to design a novel structure that can efficiently store all non-empty elements in a discernibility matrix and also reduce the cost of storage space.

In order to reduce the storage space of the existing feature selection methods based on discernibility matrix, in this paper, we introduce a novel condensing tree structure (C-Tree), which is an extended order-tree, every non-empty entry of a discernibility matrix is mapped to one path in the C-Tree and a lot of non-empty entries may share the same path or prefix, so the C-Tree has much lower space complexity as compared to the discernibility matrix. Moreover, our feature selection algorithms employ the C-Tree structure and incorporate some heuristic strategies, hence efficiently reduce both space and computational complexities. Algorithms of this paper are experimented using some standard datasets and synthetic datasets for testing both time and space complexities. Experimental results show that the algorithms of this paper can efficiently reduce cost of storage and be computationally inexpensive when compared to the existing algorithms based on discernibility matrix for feature reduction.

The rest of this paper is organized as follows. In Section 2, some basic concepts on RST are briefly introduced. In Section 3, the C-Tree and its generation approach are introduced. Two C-Tree-based rough set feature selection algorithms are developed in Section 4. Some experimental comparisons are presented in Section 5. Finally, Section 6 gives our conclusions and several issues for future works.

Section snippets

Preliminary concepts of RST

This section only introduce some essential definitions from RST that are used for feature reduction, more details and formal definitions about the RST can be found in [15], [19].

In RST, a dataset can be formally described using an information system [15], [19] (also called a decision table in this paper). An information system is denoted as IS=〈U, Q, V, f〉, where U={x1, x2,…,xn} is a non-empty finite set of objects or cases, called universe, Q is a non-empty finite set of features, Q=CD, where

Condensing tree (C-Tree)

As mentioned above, the space complexity of a discernibility matrix is O(m*n2) for a decision table with n objects and m conditional features. Clearly, if the number of conditional features m is fixed, the existing algorithms based on discernibility matrix become infeasible when the number n of objects is very large.

For a given discernibility matrix DM, in most cases, we find that entries of DM hold a lot of differently common feature subset, that is, some feature subsets may appear in many

A heuristic feature selection algorithm based on C-Tree

According to the heuristic strategy of JohnsonsReduct algorithm, by using C-Tree structure, a new feature selection algorithm is as follows. It sequentially selects features by searching for those that are most discernible for a given decision feature.

Algorithm 2

JreductBtree(C, D, U)

Input C: conditional features, D: decision feature, U: objects

Output R: feature reduction RC

  • 1.

    Rφ, AC

  • 2.

    TGeneratingC-Tree(C, D, U)

  • 3.

    do

  • 4.

    aselectHighestFrequencyfeature(T)

  • 5.

    RR∪{a}

  • 6.

    i←locate(a, HT) //the position of feature a in header

Experimental results

In order to compare the feature reduction algorithms given in this paper with JohnsonsReduct algorithm based on the discernibility matrix (see, e.g. [4], [20]), we perform the experiments on six publicly available datasets from UCI database (These datasets can be downloaded at http://www.ics.uci.edu) and five synthetic datasets. A brief description for the UCI datasets is given at first: (1) Mushroom: 22-condition features (22C), 1-decision feature (1D), 8124 objects. For short, (22C,1D,8124);

Conclusions and future work

In this paper we introduce a novel C-Tree, which is generally highly compact, many non-empty entries corresponding to a given discernibility matrix for an information system may share a common path or a common prefix of a path, hence in most cases the space complexity can be efficiently reduced. Based on the C-Tree structure, two heuristic search algorithms for feature reduction are presented, which can efficiently obtain the corresponding heuristic information because all discernible

Acknowledgments

This paper is partially supported by National Natural Science Foundation of P.R. China and Jiangsu under Grant nos. 40771163 and BK2005135, and National Science Research Foundation of Jiangsu Province under Grant no. 05KJB520066, respectively.

Yang Ming received his Ph.D. degree in the department of computer science and engineering from Southeast University at Nanjing in 2004. He received his M.S. degree in the department of mathematics from University of Science & Technology of China, and his B.S. degree in the department of mathematics from Anhui Normal University, in 1990 and 1987. He is currently a Professor in the department of computer science at Nanjing Normal University. His research interests include data mining and

References (24)

  • J.W. Guan et al.

    Rough computational methods for information systems [J]

    Artif. Intell.

    (1998)
  • R.W. Swiniarski et al.

    Rough set methods in feature selection and recognition

    Pattern Recognition Lett.

    (2003)
  • U.M. Fayyad, L.P. Shapiro, P. Smyth, R. Uthurusamy, in: Menlo Park (Ed.), Advances in Knowledge Discovery and Data...
  • K.M. Gupta, P.G. Moore, D.W. Aha, S.K. Pal, Rough set feature selection methods for case-based categorization of text...
  • K.M. Gupta, W.A. David, M. Philip, Rough set feature selection algorithms for textual case-based classification, in:...
  • J.W. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the ACM SIGMOD...
  • X.H. Hu et al.

    Learning in relational databases: a rough set approach

    Computat. Intell. Int. J.

    (1995)
  • F. Hu, G.Y. Wang, H. Huang, Y. Wu, Incremental attribute reduction based on elementary sets, in: Proceedings of 10th...
  • R. Jensen et al.

    Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches

    IEEE Trans. Knowl. Data Eng.

    (2004)
  • R. Jensen, Q. Shen, A. Tuson, Finding rough set reducts with SAT, in: Proceedings of the 10th International Conference...
  • W. Jue et al.

    Reduction algorithm based on discernibility matrix the ordered features method

    J. Comput. Sci. Technol.

    (2001)
  • Juzhen Dong, Ning Zhong, Setsuo Ohsuga, Using rough sets with heuristics for feature selection, in: N. Zhong, A....
  • Cited by (44)

    • Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm

      2019, Expert Systems with Applications
      Citation Excerpt :

      Considering misclassification cost, Shu and Shen defined a multi-criteria evaluation function to evaluate the significance of features by using the classification accuracy and the feature cost, and developed a forward greedy algorithm to select a good feature subset (Shu & Shen, 2016). Compared with the discernibility matrix-based feature selection algorithm (Yang & Yang, 2008), the minimal-redundancy-maximal-relevance method (mRMR) (Peng, Long, & Ding, 2005), and the consistency-based algorithm (Hu, Zhao, Xie, & Yu, 2007), the authors demonstrated the superior performance of their algorithm. Considering the trade-off between test cost and misclassification cost, Zhao et al. proposed a new neighborhood rough set model based on adaptive neighborhood granularity, and introduced a fast backtracking algorithm to solve the model (Zhao et al., 2016).

    • Multi-criteria feature selection on cost-sensitive data with missing values

      2016, Pattern Recognition
      Citation Excerpt :

      In this paper, we will focus on the feature selection on cost-sensitive data with missing feature values for selecting a feature subset of a minimized total cost and the same descriptive power as the whole feature set. In rough set theory, many feature selection algorithms have been developed in the last decade [11,13,18,19,27,35–37]. Two key tasks in constructing a feature selection scheme are, formulation of a suitable criterion to evaluate the quality of candidate feature and searching the optimal feature subset in terms of the criterion [18,37].

    • Mutual information criterion for feature selection from incomplete data

      2015, Neurocomputing
      Citation Excerpt :

      Though the above discernibility matrix methods can find all the subsets of features in incomplete decision systems, they are usually time-consuming and intolerable to process large-scale incomplete data. To reduce the storage space of the existing discernibility matrix-based feature selection methods, Yang et al. [49] developed a feature selection algorithm based on a novel condensing tree structure (C-Tree). From the perspective of entropy, Sun et al. [3] proposed a rough entropy-based feature selection algorithm by using the uncertainty measure in incomplete decision systems.

    • A novel approach to improving C-Tree for feature selection

      2011, Applied Soft Computing Journal
      Citation Excerpt :

      By Algorithm 3, in most cases we can get a more compact C-Tree as compared to the C-Tree generated under the nature order of features, it is of considerable benefits for reducing the space complexity. Moreover, similar to Property 3.1 proposed in Ref. [17], we can also get the following useful property. For a given C-Tree, if R is a conditional feature subset that satisfies the following condition: the intersection of R and SP is nonempty for each path or sub-path P of the C-Tree, then POSR(D) = POSC(D) holds.

    View all citing articles on Scopus

    Yang Ming received his Ph.D. degree in the department of computer science and engineering from Southeast University at Nanjing in 2004. He received his M.S. degree in the department of mathematics from University of Science & Technology of China, and his B.S. degree in the department of mathematics from Anhui Normal University, in 1990 and 1987. He is currently a Professor in the department of computer science at Nanjing Normal University. His research interests include data mining and knowledge discovery, machine learning, rough sets theory and its applications.

    Yang Ping received her B.S. degree in the department of mathematics from Anhui Normal University, in 1989. She is currently an Associate Professor in the department of mathematics at Nanjing Normal University. Her research interests include multiple criteria decision-making, rough sets theory and its applications.

    View full text