Analysis of new variable selection methods for discriminant analysis
Introduction
The aim in the classification problem is to classify instances that are characterized by attributes or variables; that is, to determine which class every instance belongs to. Based on a set of examples (whose class is known) a set of rules are designed and generalised to classify the set of instances with the greatest precision possible.
There are several methodologies for dealing with this problem: classic discriminant analysis, logistic regression, neural networks, decision trees, instance-based learning, etc. most discriminant analysis methods search for hyperplanes in variable space that better distinguish the classes from the instances. This translates into searching for linear functions and then using them for classification purposes (Wald, Fisher, etc.). The use of linear functions enables better interpretation of the results (e.g., importance and/or significance of each variable in instance classification) when analysing the value of the coefficient obtained. Not every classification method is suited to this type of analysis and in fact some are classified as “black box” models. Thus, classic discriminant analysis continues to be an interesting methodology.
Before beginning designing a classification method, when many variables are involved, only those variables that are really required should be selected; that is, the first step is to eliminate the less significant variables from the analysis.
Thus, the problem consists in finding a subset of variables that can carry out this classification task in an optimum way. This problem is known as variables selection or feature selection. Research into this issue was started in the early 1960s by Lewis (1962) and Sebestyen (1962). According to Liu and Motoda (1998), feature selection provides some advantages such as reducing the costs of data acquisition, better understanding of the final classification model, and an increase in the efficiency and efficacy of such a model. Extensive research into variable selection has been carried out over the past four decades. Many studies on variable selection are related to medicine and biology, such as Sierra et al. (2001), Ganster et al. (2001), Inza et al. (2000), Lee et al. (2003), Shy and Suganthan (2003), and Tamoto et al. (2004).
From a computational point of view variable selection is an NP-hard problem (Kohavi, 1995, Cotta et al., 2004) and therefore there is no guarantee of finding the optimum solution . This means that when the size of the problem is large finding an optimum solution in practice is unfeasible. Two different methodological approaches have been developed for variable selection problems: (a) the optimal or exact techniques (enumerative techniques) which are able to guarantee an optimal solution, but which are only applicable to small-sized sets; and the heuristic techniques, which are able to find good solutions (although unable to guarantee the optimum) in a reasonable amount of time. Among the enumerative techniques, the Narendra and Fukunaga (1977) algorithm is one of the best known but, as pointed out by Jain and Zongker (1997), the algorithm is impractical for problems with very large feature sets. Recent references about implicit enumerative techniques of selection features adapted to regression models could be found in Gatu and Kontoghiorghes, 2003, Gatu and Kontoghiorghes, 2005, Gatu and Kontoghiorghes, 2006. On the other hand, the quality of ’heuristic’ solutions strongly varies depending on the methods used. As found in other optimization problems, metaheuristic techniques have proved to be superior methodologies. For example, among the heuristic techniques we find the works based on genetic algorithms (see Bala et al., 1996, Jourdan et al., 2001, Oliveira et al., 2003; Inza et al., 2001a, Inza et al., 2001b; Wong and Nandi, 2004) and the recent work by García et al. (2006) who present a method based on Scatter Search.
These methods search for subsets with greater classification capacity based on different criteria. However, none of them focus on the posterior use of the variables selected in the discriminant analysis. This work proposes some “ad hoc” new methods and compares the different variable selection methods for discriminant analysis. For this specific purpose the stepwise method (Efroymson, 1960) and all its variants, such as O’Gorman's (2004) as well as the backward and forward methods, can be found in the literature. These are simple selection procedures based on statistical criteria (Wilks Lamda, Fisher's F, etc.) which have been incorporated into some of the best known statistical packages such as SPSS, BMDP, etc. As highlighted by Huberty (1994) and Salvador (2000) these methods are not very efficient, and when there are many original variables the optimum is rarely achieved. The methods proposed in this work yield significantly better results as shown below.
The methods designed in this work are based on different metaheuristic techniques such as: GRASP, memetic algorithms, VNS, Tabu search and path relinking. Different tests were used to analyse and compare their efficacy with each other and with previous methods.
The remainder of this paper is organized as follows: the problem is modelled in Section 2; the GRASP procedure is described in Section 3 and the memetic algorithm in Section 4; the variable neighbourhood search procedure (VNS) is described in Section 5, and the Tabu search algorithm in Section 6. In Section 7, a modification for improving the robustness of the strategies is described and in Section 8 the results of the computational experiments are presented. Finally in Section 9 the main conclusions are offered.
Section snippets
Modelling the problem
We can formulate the problem of selecting the subset of variables with superior classification performance as follows: V being a set of m variables, such that and A being a set of instances, (also named “training” set). For each case we also know the class it belongs to. Given a predefined value , we have to find a subset , with a size p with the greatest classification capacity for the discriminant analysis, .
To be precise, the function is defined as a percentage
GRASP
Greedy randomised adaptive search procedure (GRASP), is a heuristic that constructs solutions with controlled randomisation and a greedy function. Most GRASP implementations also include a local search that is used to improve the solutions generated with the randomised greedy function. GRASP was originally proposed in the context of a set covering problem (Feo and Resende, 1989). Details of the methodology and a survey on applications can be found in Feo and Resende (1995) and Pitsoulis and
Memetic Algorithms
Memetic Algorithms are also population-based methods and they have proved to be faster than Genetic Algorithms for certain types of problems, (Moscato and Laguna, 1996). In brief, they combine local search procedures with crossing or mutating operators; due to their structure some researchers have called them hybrid genetic algorithms, parallel genetic algorithms (PGAs) or genetic local search methods. The method is gaining wide acceptance particularly for the well-known problems of
VNS (Variable neighbourhood search)
VNS is a recent metaheuristic for solving optimization problems. Its basic idea is the systematic change of neighbourhood within a local search (Hansen and Mladenovic, 1998, Hansen and Mladenovic, 1999). Two recent tutorials were published by Hansen and Mladenovic, 2002, Hansen and Mladenovic, 2003. More information is available at: http:vnsheuristic.ull.es. The VNS procedure is as follows: Read initial Solution S Repeat Repeat - - Randomly choose
Description of a basic algorithm
Tabu search (TS) is a strategy proposed by Glover, 1989, Glover, 1990. “TS is dramatically changing our possibilities of solving a host of combinatorial problems in different areas” (Glover and Laguna, 2002). This procedure explores the solution space beyond the local optimum. Once a local optimum is reached, upward moves and those worsening the solutions are allowed. Simultaneously, the last moves are marked as tabu during the following iterations to avoid cycling. Recent and comprehensive
Use of a validation set for improving the robustness in the strategies
It has been observed that the metaheuristic strategies described before (GRASP, memetic algorithms, VNS and TS) are focus more on the optimization point of view , than on a statistical point of view (i.e. in the generalization). Because of this, with the aim of increasing the robustness of the strategies, a new set of instances (“validation” set) is taken and used in the following way. A solution only is admitted as the new best solution if the number of fits in the validation set don’t get
Computational results
To check and compare the efficacy of the different methods a series of experiments was run with different test problems. We have selected data sets with enough instances for building large training sets (at least 10 cases for every freedom degree), validation set and 10 test sets from every data set. Using large training sets is recommended to obtain a trade-off between “optimization” and “generalization”. Six data sets were used. These data sets can be found in the well-known data repository
Conclusions
This work approaches the problem of variables selection for discriminant analysis. Although there are many references in the literature regarding selecting variables for their use in classification, there are very few key references on the selection of variables for their use in discriminant analysis. In fact, the most well-known statistical packages continue to use classic selection methods. In this work we propose as an alternative new methods based on various metaheuristic strategies. All of
Acknowledgements
Authors are grateful for financial support from the Spanish Ministry of Education and Science (National Plan of R&D - Projects SEJ2005-08923/ECON, and SEJ 2004-08176- 02-01/ECON).
References (43)
- et al.
A probabilistic heuristic for a computationally difficult set covering problem
Oper. Res. Lett.
(1989) - et al.
Parallel algorithms for computing all possible subset regression models using the {QR} decomposition
Parallel Comput.
(2003) - et al.
Feature subset selection by Bayesian networks based optimization
Artif. Intell.
(2000) - et al.
Feature subset selection by genetic algorithms and estimation of distribution algorithms: a case study in the survival of cirrhotic patients treated with TIPS
Artif. Intell. Med.
(2001) - et al.
Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms
Int. J. Approx. Reason.
(2001) - et al.
Automatic digital modulation recognition using artificial neural network and genetic algorithm
Signal Process.
(2004) - et al.
Using learning to facilitate the evolution of features for recognizing visual concepts
Evol. Comput.
(1996) - Cotta, C., Sloper, C.y., Moscato, P., 2004. Evolutionary search of thresholds for robust feature set selection:...
Multiple regression analysis
- et al.
Greedy randomized adaptive search procedures
J. Global Optim.
(1995)
Automated melanoma recognition
IEEE Trans. Med. Imaging
Solving feature selection problem by a parallel scatter search
European J. Oper. Res.
Efficient strategies for deriving the subset {VAR} models
Comput. Manag. Sci.
Branch-and-bound algorithms for computing the best-subset regression models
J. Comput. Graph. Stat.
Tabu search. Part I
ORSA J. Comput.
Tabu search. Part II
ORSA J. Comput.
Tabu Search
Tabu search
Fundamentals of scatter search and path relinking
Control Cybernet.
An introduction to variable neighborhood search
Cited by (31)
Robust variable selection for model-based learning in presence of adulteration
2021, Computational Statistics and Data AnalysisCitation Excerpt :Such hybrid methods usually involve feature selection based on some measure of separability between groups, like the one introduced by Indahl and Næs (2004), specifically tailored for spectroscopic data, and the one proposed by Andrews and McNicholas (2014). Further, a series of techniques based on metaheuristic strategies for variable selection in discriminant analysis can be found in Pacheco et al. (2006), while the method of Chiang and Pell (2004) relies on a stochastic search based on genetic algorithms. Even though being more complex and computationally intensive, wrapper approaches provide better classification results and a more accurate representation of the data generating process (Kohavi and John, 1997).
Set based particle swarm optimization for the feature selection problem
2019, Engineering Applications of Artificial IntelligenceVariable selection and training set design for particle classification using a linear and a non-linear classifier
2017, Chemical Engineering ScienceAssessment of serum biomarkers in rats after exposure to pesticides of different chemical classes
2015, Toxicology and Applied PharmacologyA comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis
2014, Computational Statistics and Data AnalysisBi-objective feature selection for discriminant analysis in two-class classification
2013, Knowledge-Based Systems