Authors:
Hassen Dhrif
and
Stefan Wuchty
Affiliation:
Computer Science, University of Miami, Coral Gables, FL, U.S.A.
Keyword(s):
Feature Selection, Stability, Scalability, Particle Swarm Optimization, Evolutionary Computation, Gene Discovery.
Abstract:
Feature subset selection (FSS) is an intractable optimization problem in high-dimensional gene expression datasets, leading to an explosion of local minima. While binary variants of particle swarm optimization (BPSO) have been applied to solve the FSS problem, increasing dimensionality of the feature space pose additional challenges to these techniques imparing their ability to select most relevant feature subsets in the massive presence of uninformative features. Most FSS optimization techniques focus on maximizing classification performance while minimizing subset size but usually fail to account for solution stability or feature relevance in their optimization process. In particular, stability in FSS is interpreted differently compared to PSO. Although a large volume of published studies on each stability issue separately exists, wrapper models that tackle both stability problems at the same time are still missing. Specifically, we introduce a novel ap-praoch COMBPSO (COMBinatoria
l PSO) that features a novel fitness function, integrating feature relevance and solution stability measures with classification performance and subset size as well as PSO adaptations to enhance the algorithm’s convergence abilities. Applying our approach to real disease-specific gene expression data, we found that COMBPSO has similar classification performance compared to BPSO, but provides reliable classification with considerably smaller and more stable gene subsets.
(More)