Accelerator for supervised neighborhood based attribute reduction
Introduction
Presently, the neighborhood rough set [15], [48] has gained a substantial attention in the field of Granular Computing [11], [29], [41], [55], [57], [72], mainly because neighborhood rough set provides us with an effective way which can distinguish samples in terms of different granularities [21], [23], [26], [30], [31], [33]. Such different granularities can be acquired by using different values of radius because the distances between samples are invariant. Therefore, radius is a key factor and the values of radius may affect the neighborhood related results immediately. For example, if a smaller value of radius is used, then more samples will be separated from each other; if a greater value of radius is used, then less samples will be separated from each other.
Unfortunately, by using the neighborhood relation, samples with different labels may fall into the same neighborhood. This is mainly because the neighborhood relation obtained by a given radius is not good enough for providing satisfactory discriminating performance. From this point of view, Yang et al. [63] have proposed a pseudo-label strategy for re-constructing the neighborhood relation; Liu et al. [32] also have designed a pseudo-label strategy by using Label Propagation Algorithm, note that such strategy is applied to semi-supervised learning. Nevertheless, not only is it a time-consuming process for generating pseudo labels of samples, but also the information provided by pseudo labels may be incorrect which will lead to lower quality of neighborhood rough approximations. For instance, in Ref. [53], [63], the pseudo labels of samples are obtained by using the K-means clustering approach, obviously, such a process will take lots of time. Moreover, due to the imprecise results of clustering [1], [2], samples with the same true label may be equipped with different pseudo labels. To solve these problems, it is necessary for us to design a new strategy to study neighborhood rough set.
In view of this, a novel strategy which considers the true labels of samples will be explored. Different from the pseudo-label approach, our new strategy will be realized by using the true labels instead of the pseudo labels of samples. Obviously, without the process of generating pseudo labels, the time complexity of computing neighborhood rough set will not be increased. Furthermore, through using the true labels of samples, the case may be partially avoided, which samples are separated from each other while they have same label. In the context of this paper, our new approach will be referred to as the supervised neighborhood based strategy. The main thinking of our supervised neighborhood is: considering the true labels of samples, and then analyzing the relationship between samples with respect to two different cases, i.e., samples are with the same label and samples are with different labels. In other words, samples with the same label can be regarded as the case of intra-class [42] while samples with different labels can be regarded as the case of inter-class [42]. Consequently, it is reasonable for us to construct neighborhood by using two different radii: 1) intra-class radius is used to discriminate the samples with the characteristic of intra-class; 2) inter-class radius is used to discriminate the samples with the characteristic of inter-class. Note that the value of inter-class radius should be smaller than that of intra-class radius. Otherwise, more samples with different labels will fall into the same neighborhood and such a result is undesirable.
Up to now, it has been demonstrated that attribute reduction [4], [14], [27], [36], [43], [44], [52], [67], [70], [76] is one of the most important topics in the field of rough set [9], [10], [24], [37], [40], [45], [49], [51], [54], [56], [61], [62], [74], [75]. The reason can be contributed to the fact that attribute reduction provides us with the detailed explanations of selecting attributes for reducing the dimensionality of data. From the viewpoint of neighborhood rough set, the structure of neighborhood is closely related to the results of reduct. For example, the reducts derived by using different values of radius may be so different. This is mainly because a smaller value of radius may generate a smaller neighborhood, which indicates more samples will be separated from each other; a greater value of radius may generate a greater neighborhood, which indicates more samples can be regarded as indistinguishable. Thus, some challenges about neighborhood rough set based attribute reduction should be addressed.
- 1.
The radius used in neighborhood rough set may lead to powerless reduct. This is mainly because such radius does not take the information offered by the label into account and then the neighborhood related measures may be lower quality. For example, given a radius, if the distance between two samples is less than or equal to the radius, then such two samples will fall into the same neighborhood though the labels of them are different. Immediately, if the discriminating performance of neighborhood relation is required to be preserved in attribute reduction, then the derived reduct will not distinguish such two samples, either. From this point of view, it is worth studying the supervised neighborhood based attribute reduction.
- 2.
The single radius based attribute reduction cannot reflect the variation tendency of the performances of reducts. That is, if one and only one radius is used, then only the reduct related to such radius can be obtained. Immediately, it is impossible for us to observe the variation tendency of the performances of reducts. For such a reason, it is necessary to address the problem of attribute reduction in terms of multiple different radii.
- 3.
The previous searching strategy may lead to higher time consumption if multiple radii related attribute reduction is considered. For instance, if the reducts based on n different radii are required, then the used searching process will be repeated n times. If the number of used radii is increased further, then the time consumption will also be increased. Therefore, how to accelerate the process of finding multiple radii based reducts should also be paid much attention to.
From discussions above, considering the information provided by the true labels of samples, the supervised neighborhood based attribute reduction will be studied. Moreover, to further speed up the process of finding multiple radii based reducts, an acceleration strategy will be introduced into our searching process. Different from the traditional process, i.e., computing the reduct one by one based on the same searching steps in terms of different radii, our accelerator will be realized through considering the variation of radius, that is, the reduct obtained by using previous radius will guide the searching of reduct in terms of current radius.
The main contributions of this work can be attributed to the following: 1) to improve the discriminating performance of neighborhood relation, a supervised neighborhood relation will be proposed; 2) to obtain the reduct with higher generalization performance, the supervised neighborhood based attribute reduction will be defined and explored; 3) to overcome the limitations of obtaining supervised neighborhood based reduct in terms of multiple different radii, an acceleration strategy will be proposed.
The rest of this paper is organized as follows. The basic notions of neighborhood rough set are introduced in Section 2. The supervised neighborhood relation, supervised neighborhood based attribute reduction and our acceleration strategy will be proposed in Section 3. The experimental results and detailed comparisons are discussed in Section 4. Section 5 includes the conclusion and perspectives for future researches.
Section snippets
Neighborhood relation
Generally, a decision system can be denoted as DS=, in which U is the set of samples, AT is the set of condition attributes and d is the decision attribute. and , indicates the value of over condition attribute b, indicates the value of over decision attribute d, i.e., the label of .
Given a decision system DS, assuming that the values of decision attribute are categorical, then an equivalence relation over d can be defined as
Supervised neighborhood based rough set and attribute reduction
Following the traditional neighborhood relation shown in Eqs. (2) and (3), the size of the neighborhood is directly determined by the given radius and the distance between two samples. However, this method may be powerless in characterizing whether the samples with different labels are similar or not. The reason may attribute to the fact that two samples with different labels will fall into the same neighborhood if an unsuitable radius is employed. To further improve the discriminating
Experimental analyses
To demonstrate the effectiveness of our supervised neighborhood strategy and accelerator, 12 UCI data sets are selected to conduct the experiments. The details of these data are shown in Table 1.
In the context of this paper, not only is 5-fold cross validation employed, but also 20 different intra-class radii are used, they are 0.05, 0.1, ⋯, 1, the corresponding inter-class radii are set to be “0.5”, “0.6”, “0.7”, “0.8”, “0.9”.
Moreover, two types of the neighborhood based
Conclusions and future perspectives
In this paper, the framework of supervised neighborhood has been introduced into attribute reduction. Different from previous researches, our supervised neighborhood is constructed by using intra-class radius and inter-class radius through considering the information offered by label. Furthermore, to find the corresponding reducts over multiple radii, an acceleration strategy has been designed. Different from the naive strategy to obtain reducts in terms of multiple different radii, our
Declaration of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Acknowledgement
This work is supported by the Natural Science Foundation of China (Nos. 61572242, 61906078), the Key Laboratory of Data Science and Intelligence Application, Fujian Province University (No. D1901) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX19_1715).
References (76)
- et al.
A three-way clustering approach for handling missing data using GTRS
Int. J. Approx. Reason.
(2018) - et al.
An incremental algorithm for attribute reduction with variable precision rough sets
Appl. Soft Comput.
(2016) - et al.
Attribute selection base on a new conditional entropy for incomplete decision systems
Knowl.-Based Syst.
(2013) - et al.
Quick general reduction algorithms for inconsistent decision tables
Int. J. Approx. Reason.
(2017) - et al.
Mixed feature selection based on granulation and approximation
Knowl.-Based Syst.
(2008) - et al.
Neighborhood rough set based heterogeneous feature subset selection
Inf. Sci.
(2008) - et al.
Neighborhood classifiers
Expert Syst. Appl.
(2008) - et al.
Gaussian kernel based fuzzy rough sets: model, uncertainty measures and applications
Int. J. Approx. Reason.
(2010) - et al.
Sequential three-way classifier with justifiable granularity
Knowl.-Based Syst.
(2019) - et al.
Generalized attribute reduct in rough set theory
Knowl.-Based Syst.
(2016)
Accelerator for multi-granularity attribute reduction
Knowl.-Based Syst.
Cost-sensitive rough set approach
Inf. Sci.
An evidential analytics for buried information in big data samples: case study of semiconductor manufacturing
Inf. Sci.
Concept learning via granular computing: a cognitive viewpoint
Inf. Sci.
A comparative study of multigranulation rough sets and concept lattices via rule acquisition
Knowl.-Based Syst.
Rough set based semi-supervised feature selection via ensemble selector
Knowl.-Based Syst.
An efficient selector for multi-granularity attribute reduction
Inf. Sci.
Test-cost-sensitive attribute reduction
Inf. Sci.
Feature selection with test cost constraint
Int. J. Approx. Reason.
An efficient accelerator for attribute reduction from incomplete data in rough set framework
Pattern Recognit.
Positive approximation: an accelerator for attribute reduction in rough set theory
Artif. Intell.
A multiple-valued logic approach for multigranulation rough set model
Int. J. Approx. Reason.
Generation of rough sets reducts and constructs based on inter-class and intra-class information
Fuzzy Sets Syst.
Fuzzy rough set-based attribute reduction using distance measures
Knowl.-Based Syst.
On rule acquisition in incomplete multi-scale decision tables
Inf. Sci.
Compacted decision tables based attribute reduction
Knowl.-Based Syst.
Generalized multigranulation double-quantitative decision-theoretic rough set
Knowl.-Based Syst.
A novel approach to information fusion in multi-source datasets: a granular computing viewpoint
Inf. Sci.
Prediction of protein structural classed by decreasing nearest neighbor error rate
Multi-label learning with label-specific feature reduction
Knowl.-Based Syst.
A sequential three-way approach to multi-class decision
Int. J. Approx. Reason.
Pseudo-label neighborhood rough set: measures and attribute reductions
Int. J. Approx. Reason.
Test cost sensitive multigranulation rough set: model and minimal cost selection
Inf. Sci.
Updating multigranulation rough approximations with increasing of granular structures
Knowl.-Based Syst.
Ensemble selector for attribute reduction
Appl. Soft Comput.
Class-specific attribute reducts in rough set theory
Inf. Sci.
Discernibility matrix simplification for constructing attribute reducts
Inf. Sci.
A dynamic three-way decision model based on the updating of attribute values
Knowl.-Based Syst.
Cited by (74)
Feature selection in threes: Neighborhood relevancy, redundancy, and granularity interactivity
2023, Applied Soft ComputingBi-directional adaptive neighborhood rough sets based attribute subset selection
2023, International Journal of Approximate ReasoningA novel variable precision rough set attribute reduction algorithm based on local attribute significance
2023, International Journal of Approximate ReasoningMapReduce accelerated attribute reduction based on neighborhood entropy with Apache Spark
2023, Expert Systems with Applications