Elsevier

Information Sciences

Volume 181, Issue 1, 1 January 2011, Pages 115-128
Information Sciences

Simultaneous feature selection and classification using kernel-penalized support vector machines

https://doi.org/10.1016/j.ins.2010.08.047Get rights and content

Abstract

We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We performed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features.

Introduction

Classification is one of the most important data mining tasks. The performance of the respective models depends on – among other elements – an appropriate selection of the most relevant features which is a combinatorial problem in the number of original features and offers the following advantages [1]:

  • A low-dimensional representation reduces the risk of overfitting [5], [10].

  • Using fewer features decreases the model’s complexity which improves its generalization ability.

  • A low-dimensional representation requires less computational effort.

Among existing classification methods, support vector machines (SVMs) provides several advantages such as adequate generalization to new objects, absence of local minima, and representation that depends on only a few parameters [21]. However, this method in standard formulation does not determine the importance of the features used [10] and is therefore not suitable for feature selection.

This fact has motivated the development of several approaches for feature selection using SVMs (see e.g. [6]). Those methods generally work as filters selecting features from a high-dimensional feature space prior to designing the subsequent classifier. They provide feature ranking but without considering the combination of variables that optimizes classification performance. In this paper a novel embedded method for feature selection using SVM for classification problems is introduced. This method, called kernel-penalized SVM (KP-SVM), determines simultaneously a classifier with high classification accuracy and an adequate feature subset by penalizing each feature’s use in the dual formulation of the respective mathematical model. In numerical experiments using four well-known data sets, KP-SVM outperforms existing approaches.

This paper is structured as follows. Section 2 introduces SVM for classification. Recent developments for feature selection using SVMs are reviewed in Section 3. KP-SVM, the proposed embedded method for feature selection based on SVM is presented in Section 4. Section 5 provides experimental results using four real-world data sets. Several important aspects that arise from this work are discussed in Section 6. A summary of this paper can be found in Section 7 where we provide its main conclusions and address future developments.

Section snippets

Classification with SVM

Vapnik [21] developed SVMs for binary classification. This section introduces the respective approach using the following terminology. Given training vectors xiRn, i = 1,  , m and a vector of labels yRm, yi  {−1, +1}, SVM provides the optimal hyperplane f(x) = wT · x + b that aims to separate the training patterns. In the case of linearly separable classes this hyperplane maximizes the sum of the distances to the closest positive and negative training patterns. This sum is called margin. To construct the

Feature selection for SVMs

According to [5], [6], there are three main directions for feature selection: filter, wrapper, and embedded methods. In this section we provide a brief overview of each one of these approaches, and present the methods that have been compared with the proposed technique in the present paper. The first direction (filter methods) uses statistical properties of the features in order to filter out poorly informative ones. This is usually done before applying any classification algorithm.

The Fisher

The proposed method for feature selection

An embedded method for feature selection using SVMs is proposed in this section. The reasoning behind this approach is that we can improve classification performance by eliminating the features that affect on the generalization of the classifier by optimizing the kernel function. The main idea is to penalize the use of features in the dual formulation of SVMs using a gradient descent approximation for kernel optimization and feature elimination. The proposed method attempts to find the best

Experimental results

We applied the proposed approach for feature selection on four well-known benchmark data sets: Two real-world data sets from the UCI repository [8], and two DNA microarray data sets. These data sets have already been used to compare feature selection algorithms (see, for example, [15], [22]).

For model selection we follow the procedure presented in [16]: training and test subsets are obtained from the original data set by dividing it randomly, preserving the proportions of the different classes.

Discussions

The main advantage of KP-SVM in terms of computational effort is that it automatically obtains an optimal feature subset, avoiding a validation step to determine how many ranked features will be used for classification. However, several parameters should be tuned in order to obtain the final solution. In this section we study the method’s performance by varying one parameter at a time, obtaining its influence on the final solution.

For the different data sets we vary the parameters C2, β, γ, ϵ,

Conclusions

In this paper we present a novel embedded method for feature selection using SVMs. A comparison with other feature selection techniques shows the advantages of our approach:

  • Empirically, KP-SVM outperforms other filter and wrapper techniques, based on its ability to adjust better to the data by optimizing the kernel function and simultaneously selecting an optimal feature subset for classification.

  • Unlike most feature selection methods, it is not necessary to set the feature number to be selected

Acknowledgements

Support from the Chilean Instituto Sistemas Complejos de Ingeniera (ICM: P-05-004-F, CONICYT: FBO16) is greatly acknowledged (www.sistemasdeingenieria.cl). The first author also acknowledges a grant provided by CONICYT for his Ph.D. studies in Engineering Systems at Universidad de Chile.

References (24)

  • I. Guyon et al.

    Feature Extraction, Foundations and Applications

    (2006)
  • I. Guyon et al.

    Gene selection for cancer classification using support vector machines

    Machine Learning

    (2002)
  • Cited by (235)

    • A novel enhanced hybrid clinical decision support system for accurate breast cancer prediction

      2023, Measurement: Journal of the International Measurement Confederation
    View all citing articles on Scopus
    1

    The author is presently affiliated with NetApp Bangalore India, Advanced Technology Group.

    View full text