Elsevier

Information Sciences

Volume 505, December 2019, Pages 457-472
Information Sciences

An efficient selector for multi-granularity attribute reduction

https://doi.org/10.1016/j.ins.2019.07.051Get rights and content

Highlights

  • Multi-granularity is considered in the attribute reduction for improving classification performances.

  • Multi-granularity attribute selector is proposed to accelerate searching of reduct.

  • Our proposed selector is efficient and effective.

Abstract

Presently, the mechanism of multi-granularity has been frequently realized by various mathematical tools in Granular Computing especially rough set. Nevertheless, as a key topic of rough set, attribute reduction has been rarely exploited by the concept of multi-granularity. To fill such a gap, Multi-Granularity Attribute Reduction is defined to characterize reduct which satisfies the intended multi-granularity constraint instead of one and only one granularity based constraint. Furthermore, to accelerate the searching process of reduct, Multi-Granularity Attribute Selector is introduced into the framework of heuristic algorithm. Its key procedure is twofold including: (1) fuse all the granularities based measure-values to construct the multi-granularity constraint; (2) integrate the suitable granularities based measure-values to evaluate the candidate attributes. Based on the multi-granularity structure formed by neighborhood rough set, the experimental results over 20 UCI data sets demonstrate that compared with single granularity attribute reduction, our selector can not only generate reducts which may not contribute to poorer classification performances, but also significantly reduce the elapsed time of computing reducts. This research suggests the new trend of attribute reduction in multi-granularity environment.

Introduction

Attribute reduction [2], [14], [21], a rough set based feature selection, aims to reduce the dimensionality of data by searching for a qualified reduct, i.e., a subset of conditional attributes. For such purpose, as reported by Yao et al. [43], a reduct is generally required to satisfy one intended constraint. Correspondingly, with respect to different constraints, various forms of attribute reduction have been explored [3], [4], [9], [13], [18].

With a careful reviewing of previous researches, from the viewpoint of Granular Computing (GrC) [11], [26], [36], it is not difficult to reveal that most of the attribute reduction approaches possess a similar mechanism: to form a suitable constraint for the corresponding attribute reduction, information granulation [12], [22], [37] is frequently conducted beforehand by using indiscernibility relation [21], distance function [47], clustering analysis [10], etc. It should be emphasized that different results of information granulation may contribute to different reducts, because with the varying results of information granulation, the constraint related to the reduct may be stricter or looser. An immediate problem is how to distinguish different series of results, including information granulation, constraint and even reduct. Regrading such problem, fortunately, the concept of granularity [24] can be used to quantitatively characterize these results for revealing the difference.

Actually, the concept of granularity can be acquired by various approaches. To the best of our knowledge, most of previous researches related to granularity can be categorized into the following three strategies.

  • Parameter based granularity. The granularity is closely related to the appointed parameter. For such case, to distinguish different results of information granulation, the immediate granularity is actually determined by the value of parameter, it follows that the difference between those results of information granulation can be reflected by the difference between the values of parameters. It should be noticed that smaller value of parameter generally contributes to a finer granularity. For instance, in neighborhood rough set [6], smaller value of parameter (radius for calculating neighborhoods of samples) may generate smaller size of the neighborhoods of samples, it follows that the finer granularity will be obtained. Similar explanation can also be observed in Gaussian kernel based fuzzy rough set [7].

  • Sample based granularity. The granularity is closely related to the employed samples. For such case, to distinguish different results of information granulation, the immediate granularity is actually determined by the structure of samples, it follows that the difference between those results of information granulation can be reflected by the difference between the structures of samples. For instance, in K-fold cross-validation, K different sets of training and testing samples can be regrouped [8]. In view of GrC, K results of information granulation are derived. Immediately, the difference between these results can be reflected by structures of samples since for those regrouped sets, the contained training and testing samples are significantly different.

  • Attribute based granularity. The granularity is closely related to the considered attributes. For such case, to distinguish different results of information granulation, the immediate granularity is actually determined by the distinguishability of attributes, it follows that the difference between those results of information granulation can be reflected by the difference between the distinguishabilities of attributes. As suggested by Xu et al. [35], different attributes with different distinguishabilities induce multiple equivalence relations, and then those relations may naturally construct multiple different granularities. Additionally, it should be noticed that the stronger distinguishability of attributes generally contributes to a finer granularity. For instance, as reported by Liao et al. [16], in the problem of feature selection, the considered attributes (conditional attributes, i.e., features) are supposed to have a feature-value granularity. Compared with those deleted attributes, the remained attributes may have stronger distinguishability since they may provide stronger relevances or better generalization performances, it follows that the finer granularity will be obtained.

Following these different forms of granularity, it is well known that most of the existing attribute reduction approaches are essentially single granularity based ones, which are referred to as single granularity attribute reductions in this paper. This is mainly because the construction of constraint focuses on one and only one fixed result of information granulation induced by single parameter, single sample space or single attribute set. Nevertheless, one single granularity attribute reduction may involve some limitations as follows.

  • 1.

    Single granularity attribute reduction may result in the poor adaptability of the derived reduct to the problem of granularity diversity. As what has been reported by Yang and Yao [41], if a reduct is generated over one and only one considered granularity, it may not be still the qualified reduct over a little finer or coarser granularity which may be caused by slight variation of data.

  • 2.

    Single granularity attribute reduction may increase the time consumption if multiple different granularities [44] are required. For example, given a set of multiple different granularities, a simple and direct method to design attribute reduction is: repeat the single granularity attribute reduction in terms of the number of considered granularities. Obviously, such process is too time-consuming.

To handle the limitations mentioned above, it is necessary to develop a novel thinking: re-consider attribute reduction in the case of multi-granularity. From this point of view, Multi-Granularity Attribute Reduction (MGAR) is proposed. Fig. 1(a) shows the framework of single granularity attribute reduction while the framework of our proposed MGAR is illustrated in Fig. 1(b).

Obviously, following Fig. 1, different from single granularity attribute reduction, with respect to multiple different granularities, two or more information granulations are required in MGAR. Moreover, based on these results of information granulation, a specific constraint, called multi-granularity constraint, is established and then attribute reduction can be conducted. Actually, as the most distinguished characteristic in MGAR, such multi-granularity constraint allows us to select attributes in a more versatile way. It follows that the derived reduct may offer us higher adaptability to granularity world.

Alluding to the definition of MGAR, how to realize it in detail becomes an interesting topic which deserves more investigations. Actually, the consideration of multiple different granularities in MGAR may easily result in greater complexity of the searching process of reduct. Immediately, an efficient selector for MGAR is proposed to alleviate such weakness, which is referred to as Multi-Granularity Attribute Selector (MGAS) in the context of this paper. Such selector is expected to speed up the searching process of multi-granularity reduct. The acceleration mechanism in MGAS is mainly embodied in two open problems as follows.

  • 1.

    How to construct the multi-granularity constraint. To address such problem, a fused measure-value based multi-granularity constraint is designed. With such strategy, an undesired case can be avoided: the intended multi-granularity constraint may be so strict that no attribute can be deleted. Additionally, it can reduce the number of constraints with respect to multiple different granularities.

  • 2.

    How to evaluate the candidate attributes. To address such problem, the finest and coarsest granularities based evaluations of attributes are concerned. With such strategy, some redundant evaluations are pruned, it follows that the elapsed time of evaluating attributes is effectively reduced.

The main contribution of this work can be summarized as the following aspects: (1) through analyzing attribute reduction from the viewpoint of GrC, we offer the revelation that most of the previous attribute reduction approaches are essentially based on single granularity; (2) to solve the inherent limitations in single granularity attribute reduction, Multi-granularity Attribute Reduction is proposed; (3) to accelerate the searching process of multi-granularity reduct, Multi-granularity Attribute Selector is designed; (4) extensive experimental results are analyzed to demonstrate that our selector is effective in the classification-oriented attribute reduction and efficient in the computational issue of reduct.

The remainder of this paper is organized as follows. In Section 2, we briefly review related works of granularity. Preliminary knowledge is presented in Section 3. Multi-granularity Attribute Reduction (MGAR) and Multi-granularity Attribute Selector (MGAS) are introduced in Section 4. In Section 5, comparative experimental results over UCI data sets are shown, as well as the corresponding analyses. This paper is ended with conclusions and future perspectives in Section 6.

Section snippets

Granular computing and rough set

Up to now, by applying the mechanism of information granulation to problem solving, Granular Computing (GrC) has been developed as an umbrella which covers theories, methodologies and techniques related to the concepts of granule and granularity [17], [38], [39]. As basic elements in GrC, the concepts of granule and granularity have been thoroughly investigated [24], [31]. In a wide sense, granule and granularity can be described as follows.

  • Granule: a collection of entities drawn together by

MGAR: multi-granularity attribute reduction

In retrospect, most of attribute reductions are single granularity based cases. However, as what has been pointed out in Section 1, single granularity attribute reduction may involve several inherent limitations: (1) it may result in the poor adaptability of the generated reduct to the problem of granularity diversity; (2) it may increase the time consumption if multiple different granularities are required. To solve these problems, Multi-Granularity Attribute Reduction (MGAR) is proposed in

Data sets

To demonstrate the effectiveness of the proposed MGAS (Algorithm 2), 20 real-world data sets from UCI Machine Learning Repository have been employed in the context of this paper. Table 1 summarizes some detailed statistics of those data sets used in our experiments. Note that all of the data sets are used for classification task with numerical attribute values, and they have been normalized by column.

Experimental setup and configuration

All the experiments have been carried out on a personal computer with Windows 10, Intel Core 2

Conclusions and future perspectives

Different from previous researches, attribute reduction is explored with the consideration of multi-granularity in this paper. In view of such novel thinking, Multi-Granularity Attribute Reduction (MGAR) is defined, and then to realize MGAR feasibly, an efficient selector referred to as Multi-Granularity Attribute Selector (MGAS) is designed. Based on the neighborhood rough set which can naturally form multi-granularity structure by different radii, the experimental results over 20 UCI data

Declaration of interest statement

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their constructive comments. This work is supported by the Natural Science Foundation of China (No. 61572242), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX19_1715) and the Key Laboratory of Data Science and Intelligence Application, Fujian Province University (No. D1901).

References (47)

Cited by (0)

View full text