Consensus unsupervised feature ranking from multiple views

doi:10.1016/j.patrec.2007.11.012

Pattern Recognition Letters

Volume 29, Issue 5, 1 April 2008, Pages 595-602

https://doi.org/10.1016/j.patrec.2007.11.012 Get rights and content

Abstract

Feature ranking is a kind of feature selection process which ranks the features based on their relevances and importance with respect to the problem. This topic has been well studied in supervised classification area. However, very few works are done for unsupervised clustering under the condition that labels of all instances are unknown beforehand. Thus, feature ranking for unsupervised clustering is a challenging task due to the absence of labels of instances for guiding the computations of the relevances of features. This paper explores the feature ranking approach within the unsupervised clustering area. We propose a novel consensus unsupervised feature ranking approach, termed as unsupervised feature ranking from multiple views (FRMV). The FRMV method firstly obtains multiple rankings of all features from different views of the same data set and then aggregates all the obtained feature rankings into a single consensus one. Experimental results on several real data sets demonstrate that FRMV is often able to identify a better feature ranking when compared with that obtained by a single feature ranking approach.

Introduction

Data analysis often deals with complex data sets containing a large number of features (Schena et al., 1995, Baldi and Hatfield, 2002, Yu and Liu, 2004). For example, classification problems in molecular biology may involve thousands of features. Commonly, not all the features are usable for the learning task. Some of the features are redundant, some of them are irrelevant and noisy, which may even degrade the performance of the learning algorithms. More seriously, high-dimensionality of a data set often causes several problems. For example, data instances in high dimensions become very sparse and most of them look equally far from the centroids of clusters. Therefore, the performance of any learning algorithms using the concept of distance to measure the similarity of data instances may significantly degrade because of such a problem (Fukunaga, 1990, Blum and Langley, 1997). In order to remove the noisy features and to mitigate the curse of dimensionality, an important step related to analyzing high-dimensional data sets is how to select a meaningful subset of all features. This process is commonly termed as feature selection and named as variable selection in (Liu and Motoda, 1998). A good feature selection has several advantages for a learning algorithm, such as lower computational cost, better classification accuracy and result comprehensibility.

However, feature selection is not an easy task. To find a good subset of the feature vector sometimes requires an exponentially number of evaluations, which in fact is intractable if the data set has a large number of features (Liu and Yu, 2005, Pudil et al., 1994, Kim et al., 2000, Kudo and Sklansky, 2000, Debuse and Rayward-Smith, 1997). Another important issue related to feature selection is how to evaluate a candidate feature subset (Liu and Yu, 2005, Kim et al., 2000). Traditional feature selection approaches are supervised under the condition that the labels of all instances are known beforehand. If the labels of all instances are available, feature selection methods will evaluate a candidate feature subset in terms of their classifying accuracies on unseen data instances when wrapper feature selection approaches are adopted (Kohavi and John, 1997). Usually, the data set is divided into two sets: the training set and the test set. The classifier is trained with the training set, while its predictive error rate is estimated on the test set. When the filter approaches are adopted, the relevance of a feature subset is calculated according to the correlations between the features in the feature subset and the labels of all instances (Yu and Liu, 2003). However, we do not have any label information about data instances in many situation and thus we cannot apply labels of instances to estimate the quality of a candidate feature subset (Dy and Brodley, 2004). The absence of labels of instances increases the difficulty of feature selection in unsupervised clustering.

Feature ranking is a relaxed version of feature selection which ranks all features with respect to their relevances and chooses the top ranked features as the working feature vector manually. Therefore, feature ranking can be viewed as a kind of flexible feature subset selection approach. Feature ranking has been well studied in the supervised classification area (Guyon et al., 2004, Stoppiglia et al., 2003). In this paper, we propose a novel unsupervised feature ranking approach, termed as the unsupervised feature ranking from multiple views (FRMV). FRMV aggregates multiple feature rankings obtained from different views of the same data into a single consensus one. Therefore, FRMV is often able to achieve a better feature ranking when compared with a single feature ranking approach. We tested FRMV on several real data sets and experimental results indicated the potentials and effectiveness of FRMV.

Contributions of this work include two facets. Firstly, we extend the feature ranking methodology into the unsupervised data clustering area. Secondly, we propose a stable and robust unsupervised feature ranking approach based on the ensembles of multiple feature rankings obtained from different views of the same data set. To the best of our knowledge, very few works have been done on using ensemble learning for feature selection apart from Jong et al. (2004). In (Jong et al., 2004), a supervised feature ranking approach is proposed: Several rules are extracted from the data set by genetic algorithms in that each rule corresponds to one ranking of features. After that, it aggregates all rankings into a consensus one by the majority voting mechanism. Unlike (Jong et al., 2004) where labels of all data items are known beforehand and a population of diverse feature rankings are achieved by genetic algorithms, our feature ranking approach is unsupervised and the population of diverse feature rankings are obtained by the random subspaces method (Ho, 1998).

The remainder of this paper is arranged as follows. Section 2 briefly introduces the related work for this paper. Section 3 goes into the details of describing FRMV. Experimental results on several real data sets to test the performance of FRMV are given in Section 4. Section 5 concludes this paper.

Section snippets

Related work

In this section, the literatures about unsupervised feature selection and unsupervised clustering ensembles are briefly discussed.

Unsupervised feature ranking from multiple views

This section describes the FRMV. Before further illustrating about the FRMV, some notations used throughout this paper are given as follows. Let $D = {d_{1}, d_{2}, \dots, d_{N}}$ denote a data set containing N unlabeled instances and $d_{ij}$ is the value of the feature $f_{j}$ in the ith instance of the data set and $F = {f_{1}, f_{2}, \dots, f_{n}}$ be all features. ${RF}^{(k)} = {{rank}^{(k)} (f_{1})$ , ${rank}^{(k)} (f_{2}),$ $\dots, {rank}^{(k)} (f_{n})}$ represents the kth ranking of all features and ${rank}^{(k)} (f_{j})$ is the rank of the feature $f_{j}$ in the kth feature ranking and $1 ⩽ {rank}^{(k)} ($

Experimental results and analysis

Nine UCI data sets are selected to test the performance of FRMV (Blake and Merz). Their names and characteristics are shown in Table 1. In our experiments, we use the Rand Index method to measure the accuracy of a clustering solution I (Rand, 1971). The Rand Index of the clustering solution I is calculated as: $∥ I, I^{(accurate)} ∥ = \frac{2 \cdot (n_{00} + n_{11})}{n \cdot (n - 1)}$ where $n_{11}$ is the number of pairs of instances which are both in the same group in I and also both in the same group in $I^{(accurate)}$ and $n_{00}$ denotes the

Conclusions

This paper studied the problem of feature ranking within the unsupervised clustering area. We have proposed a consensus unsupervised feature ranking approach that combines multiple rankings of full features into a single consensus one. The proposed feature ranking approach were tested on several real data sets. Experimental results on several real data sets have demonstrated that ensembles of multiple feature rankings is able to achieve better ranking when compared with the one obtained by a

Acknowledgement

This project is supported by the Project No. 7002023, City University of Hong Kong. The authors would like to thank the comments and the suggestions from the reviewers.

References (36)

A. Blum et al.
Selection of relevant features and examples in machine learning
Artif. Intell.
(1997)
R. Kohavi et al.
Wrappers for feature subset selection
Artif. Intell.
(1997)
M. Kudo et al.
Comparison of algorithms that select features for pattern classifier
Pattern Recogn.
(2000)
P. Pudil et al.
Floating search methods in feature selection
Pattern Recogn. Lett.
(1994)
W. Tang et al.
Feature selection algorithm for mixed data with both nominal and continuous features
Pattern Recogn. Lett.
(2007)
P. Baldi et al.
DNA Microarrays and Gene Expression
(2002)
Blake, C., Merz, C., UCI repository of machine learning databases....
Dash, M., Choi, K., Scheuermann, P., Liu, H., 2002. Feature selection for clustering-A filter solution. In: Proceedings...
J.C.W. Debuse et al.
Feature subset selection within a simulated annealing data mining algorithm
J. Intell. Inform. Syst.
(1997)
Jennifer G. Dy et al.
Feature selection for unsupervised learning
J. Mach. Learn. Res.
(2004)

Fern, X.Z., Brodley, C.E., 2003. Clustering ensembles for high dimensional data clustering. In: Proc. Internat. Conf....

J. Fischer et al.

Beginning for path-based clustering

IEEE Trans. Pattern Anal. Machine Intell.

(2003)

A. Fred et al.

Combining multiple clusterings using evidence accumulation

IEEE Trans. Pattern Anal. Machine Intell.

(2005)

K. Fukunaga

Introduction to Statistical Pattern Recognition

(1990)

I. Guyon et al.

An introduction to variable and feature selection

J. Mach. Learn. Res.

(2003)

I. Guyon et al.

Gene selection for cancer classification using support vector machines

Mach. Learn.

(2004)

Hall, M.A., 2000. Correlation based feature selection for discrete and numeric class machine learning. In: Proc....

Hall, M.A., Smith, L.A., 1997. Feature subset selection: a correlation based filter approach. In: Proc. Internat. Conf....

Cited by (62)

Bi-level ensemble method for unsupervised feature selection
2023, Information Fusion
Unsupervised feature selection is an important machine learning task and thus attracts increasingly more attention. However, due to the absence of labels, unsupervised feature selection often suffers from stability and robustness problems. To tackle these problems, some works try to ensemble multiple feature selection results to obtain a consensus result. Most of the existing methods do the ensemble on the feature level, i.e., they directly ensemble feature selection results by feature ranking or voting aggregation, without paying any attention to the following downstream tasks. In this paper, we take clustering as the downstream task and wish to ensemble the base results to select features which are appropriate for clustering. To this end, we propose a novel bi-level feature selection ensemble method, which ensembles on two levels: the feature level and the clustering level. Together with feature level ensemble, we also learn a consensus clustering result from base feature selection results with self-paced learning. Then, we apply the consensus clustering result to guide the feature selection in turn. Extensive experiments are conducted to demonstrate that the proposed method outperforms other state-of-the-art feature selection and feature selection ensemble methods in the clustering task. The codes of this paper are released in https://doctor-nobody.github.io/codes/BLFSE.zip.
Dual Swin-transformer based mutual interactive network for RGB-D salient object detection
2023, Neurocomputing
Depth information for RGB-D Salient Object Detection(SOD) is important and conventional deep models are usually relied on the CNN feature extractors. The long-range contextual dependencies, dense modeling on the saliency decoder, and multi-task learning assistance are usually ignored. In this work, we propose a Dual Swin-Transformer-based Mutual Interactive Network (DTMINet), aiming to learn contextualized, dense, and edge-aware features for RGB-D SOD. We adopt the Swin-Transformer as the visual backbone to extract contextualized features. A self-attention-based Cross-Modality Interaction module is proposed to strengthen the visual backbone for cross-modal interaction. In addition, a Gated Modality Attention module is designed for cross-modal fusion. At different decoding stages, enhanced with dense connections and progressively merge the multi-level encoding features with the proposed Dense Saliency Decoder. Considering the depth quality issue, a Skip Convolution module is introduced to provide guidance to the RGB modality for the saliency prediction. In addition, we add the edge prediction to the saliency predictor to regularize the learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed method.
Rate distortion optimization with adaptive content modeling for random-access versatile video coding
2023, Information Sciences
In this paper, we study the capability of improving the rate-distortion (RD) performance based on the adaptive content modeling in the Versatile Video Coding (VVC) standard. In particular, the frame-level dependent relationships and inherent RD relationship are explored. As such, the rate dependency, distortion dependency and inherent RD characteristics are fully utilized in the global rate distortion optimization (RDO) process, and the quantization parameter (QP) for each frame could be adaptively solved. To facilitate the adaptive QP calculation, a two-pass coding strategy is proposed. In the first-pass coding, the video statistics are sufficiently collected with the proposed Dual Motion Compensation and Residual Coding (DMCRC) method to generate the parameters for the content-aware models. During the second-pass coding, the optimal QP at the frame level is obtained by optimizing the global RD performance with the dependent and inherent models. The proposed algorithm is implemented on VVC test model (VTM-4.0) and achieves significant performance gain for test sequences with constant and varying scenes.
Clustering ensemble-based novelty score for outlier detection
2023, Engineering Applications of Artificial Intelligence
Recently, One-class classification algorithms have been successfully used for outlier detection problems in several industrial fields. However, in case of that the target class has complex structures, single outlier detection model with one-class classifier often poorly performs because it cannot appropriately reflect intrinsic data structures. To address this limitation, we propose a clustering ensemble-based novelty score algorithm. The proposed algorithm calculates novelty score from the mixture of multiple clustering solutions generated by both random subspace and random-K ensemble approaches. Then, final ensemble novelty score is defined by summarizing multiple novelty scores obtained from individual clustering results. Because these multiple novelty scores are computed from many possible characteristics of target class information, the proposed ensemble novelty score can appropriately reflect the inherent structures of target class. Experiments were conducted on various benchmark datasets to compared with existing methods and investigate the properties of the proposed algorithm. The experimental results confirm that the proposed algorithm outperforms existing one-class classification methods in various cases.
Weighted co-association rate-based Laplacian regularized label description for semi-supervised regression
2021, Information Sciences
Smoothness regularization derives the optimal regression function by minimizing the squared loss combined with a smoothness regularizer that restricts the variation of the function within a neighboring region. Thus, the regression function can effectively accommodate intrinsic data structures, and prediction performance can be improved when the label information is insufficient. In this study, we propose a weighted co-association rate-based Laplacian regularized label description algorithm. In the proposed algorithm, we define a regression function by combining weighted co-association rates and a label descriptive function. We use the weighted co-association rate, computed by summarizing various clustering solutions, to depict the data structure. The label descriptive function identifies a latent label distribution, and hence helps the regression function to accurately involve as much true label information as possible. To derive the optimal label descriptive function, we apply the smoothness regularizer to label descriptive function. Experiments were conducted on various benchmark datasets to examine the properties of the proposed algorithms, and the results were compared with those of the existing methods. The experimental results confirm that the proposed algorithm outperforms the previous methods.
Consensus ranking as a method to identify non-conservative and dissenting tracers in fingerprinting studies
2020, Science of the Total Environment
Citation Excerpt :
We use a widely used hierarchical clustering method T based on the Ward algorithm (Murtagh and Legendre, 2014) to group the tracers. Feature ranking is a flexible selection process commonly used in machine learning for classification problems when a large number of attributes are present in the dataset (Hong et al., 2008). This technique orders the features by the value of some scoring function to identify uninformative or redundant features which can be removed to increase the accuracy of the model.
Soil erosion and fine particle transport are two of the major challenges in food security and water quality for the growing global population. Information of the areas prone to erosion is needed to prevent the release of pollutants and the loss of nutrients. Sediment fingerprinting is becoming a widely used tool to tackle this problem, allowing to identify the sources of sediments in a catchment. Methods in fingerprinting techniques are still under discussion with tracer selection at the centre of the debate.
We propose a novel method, termed as consensus ranking (CR), that combines the predictions of single-tracer models to identify non-conservative tracers. In this context, a numerical procedure to quantify the predictions of individual tracers is first delivered. The scoring function to rank the tracers is based on several random debates between tracers in which the tracer that prevents consensus is discarded. Based on these results, a conservativeness index (CI) is presented along with a clustering method to identify groups of similar tracers.
To illustrate the CI and CR procedures, an artificial mixture created with real soil to independently test the method is analysed. The results demonstrate the capability of our method to identify non-conservative tracers beyond the capability of currently used selection methods. Further, a real sediment sample from a Mediterranean mountain catchment is evaluated to emphasise its utility in complex natural environments. To test the utility of our method, it was decided to include the conservative and consensus-enforcing tracers extracted by this new approach with two different unmixing models. Furthermore, CR and CI procedures are displayed together with the most widespread statistical tests and the within-a-polygon approach used for tracer selection in fingerprinting studies. The new proposed method will enable the research community to homogenise results for replicability as well as allowing comparisons among study areas.

View all citing articles on Scopus

View full text

Consensus unsupervised feature ranking from multiple views

Abstract

Introduction

Section snippets

Related work

Unsupervised feature ranking from multiple views

Experimental results and analysis

Conclusions

Acknowledgement

Artif. Intell.

Artif. Intell.

Pattern Recogn.

Pattern Recogn. Lett.

Pattern Recogn. Lett.

DNA Microarrays and Gene Expression

Feature subset selection within a simulated annealing data mining algorithm

J. Intell. Inform. Syst.

Feature selection for unsupervised learning

J. Mach. Learn. Res.

Beginning for path-based clustering

IEEE Trans. Pattern Anal. Machine Intell.

Combining multiple clusterings using evidence accumulation

IEEE Trans. Pattern Anal. Machine Intell.

Introduction to Statistical Pattern Recognition

An introduction to variable and feature selection

J. Mach. Learn. Res.

Gene selection for cancer classification using support vector machines

Mach. Learn.