Bi-objective feature selection for discriminant analysis in two-class classification
Introduction
The aim in the classification problem is to classify cases that are characterized by attributes or variables, that is, to determine which class every instance belongs to. Based on a set of examples (whose class is known) a set of rules are designed and generalized to classify the set of instances with the greatest precision possible.
There are several methodologies for dealing with this problem: Classic discriminant analysis, Logistic Regression, Neural Networks, Decision Trees, Instance-Based Learning, etc. Most discriminant analysis methods search for hyperplanes in the variable space that better distinguish the classes from the instances [19]. This translates into searching for linear functions and then using them for classification purposes (Wald, Fisher, etc.). The use of linear functions enables better interpretation of the results (e.g., importance and/or significance of each variable in instance classification) when analyzing the value of the coefficient obtained. Not every classification method is suited to this type of analysis and in fact some are classified as “black box” models. Thus, classic discriminant analysis is still a useful methodology. There are recent applications of discriminant analysis and its variants, see for example the work of Lu et al. [28].
When designing a classification method for a problem which involves many variables, only those that are essential should be selected, and so the first step is to eliminate less significant variables from the analysis. The problem of finding a subset of variables that can be used to carry out the classification task in an optimum way is known as variable selection or feature selection problem. Research into this issue was started in the early 1960s by Lewis [22] and Sebestyen [33]. According to Liu and Motoda [26], feature selection provides some advantages such as reducing the costs of data acquisition, better understanding of the final classification model, and an increase in the efficiency and efficacy of such a model. Extensive research into variable selection has been carried out over the past four decades, many of the publications are related to medicine and biology [35], [13], [20], [21]; [34], [38], [24], [27], [25], [36], [37]. Recently, it has been applied to financial databases, specifically in bankruptcy prediction [39], [5], [23], [40], [7].
Feature selection problems for classification are, by nature, multi-objective optimisation problems, since, even in the simplest case, they involve feature subset size minimization and performance maximization. However, there are very few articles that address them from the multi-objective perspective. Most papers that do, have the goal of minimizing the number of variables and either maximizing performance or minimizing the error [11], [14], [4], [42].
In two-class classification, that is, when items are classified into two categories (class 1 and class 2), two types of errors may occur:
- •
Type-I error: misclassifying an item belonging to class 1 as one of class 2.
- •
Type-II error: misclassifying an item belonging to class 2 as one of class 1.
In many practical situations, errors of one type may have far more serious consequences than errors of the other type. For example, a preliminary screening test for cancer should not declare a malignant case to be benign and, subject to that condition, should avoid misclassification of benign cases as malignant as much as can be done by the test. Put differently, the probability for classifying a malignant case as benign should be zero or almost zero, and, subject to that condition, the probability of classifying a benign case as malignant should be as small as can be achieved by the test.
A first attempt to handle errors in classification can be found in the work of Felici et al. [12], which describes a general approach that supports tight error control for two-class classification. More recently, Garcı´a-Nieto et al. [15] focus on studying sensitivity and specificity applied to cancer disease. They accomplished the classification task using Support Vector Machine and the multi-objective problem is solved by a genetic algorithm. Huang et al. [18] take into account three objectives: the proportion of the total number of predictions that were correct, the proportion of cases from class 1 that were correctly identified and the proportion of cases from class 2 that were classified correctly. They use a genetic algorithm and the classification is accomplished by decision trees.
Evaluation by classification accuracy tacitly assumes equal error costs, that is, a type-I error is equivalent to a type-II error. However, this rarely happens in the real world because classifications lead to actions which have consequences. Indeed, it is hard to imagine a domain in which a learning system may be indifferent to whether it makes a type I error or type II error.
In this work we focus on the problem of selecting variables for binary classification using discriminant analysis. We address it as a bi-objective problem that minimizes type-I error and minimizes type-II error, with the purpose of explicitly differentiating both errors. To the best of our knowledge, feature selection for discriminant analysis has not been studied from this perspective so far. This study has several important advantages:
- •
Commonly both types of errors do not have the same importance; sometimes, they cannot even be compared. In these cases, using an aggregated objective function (as in mono-objective approach) can be non-suitable to compare and to check the convenience of each solution.
- •
Using a mono-objective approach a single solution is obtained, while with the bi-objective approach a set of non-dominated solutions is obtained. It allows the possibility of choosing among several options and to analyse what is the best option depending on the context and the circumstances.
- •
The bi-objective approach allows some further analysis by observing the approximation to the efficient set. For example: how many can the type-I error be reduced with only a small increase of type-II error or vice versa. This kind of analysis favors a proper decision making.
As solution method we develop an adaptation of the algorithm NSGA-II [9] which is one of the most efficient multi-objective approaches and has been successfully used in data mining applications [8], [1], [17], [3], [16], [2].
In order to illustrate the above mentioned advantages a set of databases from literature has been used to carry out computational experiments and the feature selection problem has been analyzed using both, mono-objective and bi-objective perspectives. One additional database, composed of Spanish firms, has been used in the experiments. It is an interesting example of the “credit scoring” problem and helps to highlight two interesting facts: (1) the importance of both error types is different and (2) the importance of each error type changes depending on the economic environment and other circumstances.
The rest of the paper is organized as follows. In Section 2 we establish the problem while in Section 3 we describe the proposed methodology to solve it. Computational experiments are shown in Section 4 and finally, Section 5 is devoted to our conclusions.
Section snippets
Problem description
The addressed problem can be stated as follows: Let V be a set of m variables, such that V = {v1,v2, … , vm} and A a set of cases, (also named “training” set). For each case it is known which class (1 or 2) it belongs to.
Given a predefined integer value p,p < m, we have to find a subset S ⊂ V, with size p minimizing type-I error and type-II error for the discriminant analysis. More precisely, we will consider two objective functions: f1(S) and f2(S) defined as the ratios in A of type-I and type-II
Solution approach: an adaptation of NSGA-II algorithm
NSGA-II is an improvement over NSGA (non-dominated sorting genetic algorithm) that deals with three major drawbacks of the original approach: (1) high computational cost of sorting, (2) lack of elitism and (3) lack of a parameter-free diversity-preservation mechanism [9].
To solve the addressed problem we have developed a procedure based on NSGA-II, which will be referred as NSGAFS (NSGA-II for Feature Selection). Fig. 1 shows the outline of the NSGAFS algorithm.
As it can be observed, the
Computational experiments
To assess the relevance of formulating the feature selection problem for classification as a bi-objective one, as well as to check and compare the efficacy our proposed method NSGAFS, a series of experiments was run with different databases. In the next subsection these databases are described. In Section 4.2 a comparison of our NSGAFS with different approaches for the single-objective problem is shown. The results obtained with the proposed method are exposed in Section 4.3. A case study is
Conclusions
This work deals with a problem of feature selection for discriminant analysis in two-class classification. We explicitly consider two objectives: minimizing the ratio of type-I error and minimizing the ratio of type-II error. The problem is then treated as a bi-objective optimization problem. In order to obtain an approximation to the efficient curve we propose and implement an adaptation of the NSGA-II algorithm.
The importance of studying this bi-objective problem is given by the fact that
Acknowledgements
This work has been partially supported by the Research Chair in Industrial Engineering of Tecnológico de Monterrey (ITESM Research Fund CAT128), FEDER founds and Spanish Ministry of Science (Project ECO2008-06159/ECON) and Regional Government of “Castilla y León”, Spain (Project BU008A10-2). These supports are gratefully acknowledged.
References (43)
- et al.
NSGA-II-trained neural network approach to the estimation of prediction intervals of scale deposition rate in oil & gas equipment
Expert Systems with Applications
(2013) - et al.
A novel bankrutpcy prediciton model based on an adaptive fuzzy k-nearest neighbor method
Knowledge-Based Systems
(2011) - et al.
Bankruptcy prediction models based on multinorm analysis: an alternative to accounting ratios
Knowledge-Based Systems
(2012) - et al.
Sensitivity and specificity based multiobjective approach for feature selection: application to cancer diagnosis
Information Processing Letters
(2009) - et al.
Feature subset selection by bayesian networks based optimization
Artificial Intelligence
(2000) - et al.
The random subspace binary logit (RSBL) model for bankruptcy prediction
Knowledge-Based Systems
(2011) Feature selection based on cluster and variability analyses for ordinal multi-class classification problems
Knowledge Based System
(2013)- et al.
Supervised immune clonal evolutionary classification algorithm for high-dimensional data
Neurocomputing
(2012) - et al.
Incremental learning of complete linear discriminant analysis for face recognition
Knowledge Based System
(2012) - et al.
Analysis of new variable selection methods for discriminant analysis
Computational Statistics and Data Analysis
(2006)
A variable selection method based in tabu search for logistic regression models
European Journal of Operational Research
Feature selection using dynamic weights for classification
Knowledge Based System
Feature selection in bankruptcy prediction
Knowledge-Based Systems
Simple instance selection for bankruptcy prediction
Knowledge-Based Systems
A discrete particle swarm optimization method for feature selection in binary classification problems
European Journal of Operational Research
Multiobjective evolutionary algorithms: a survey of the state of the art
Swarm and Evolutionary Computation
A new multi-objective evolutionary approach for creating ensemble of classifiers
IEEE International Conference on Systems Man and Cybernetics
A multiobjective evolutionary approach to concurrently learn rule and data bases of linguistic fuzzy-rule-based systems
IEEE Transactions on Fuzzy Systems
Application of a niched Pareto genetic algorithm for selecting features for nuclear transients classification
International Journal of Intelligent Systems
Evolutionary Algorithms for Solving Multi-Objective Problems
Data mining using multi-objective evolutionary algorithms
Proceedings of IEEE Congress on Evolutionary Computation
Cited by (18)
A decomposition-based multi-objective particle swarm optimization algorithm with a local search strategy for key quality characteristic identification in production processes
2022, Computers and Industrial EngineeringCitation Excerpt :To cope with this data imbalance problem, other classification performance measures are adopted instead of accuracy in the FS methods. For example, Pacheco et al. (2013) proposed a multi-objective FS method that optimizes the type I and type II errors using NSGA-II. Kozodoi et al. (2019) adopted NSGA-II to solve an FS model that is defined as maximizing an expected maximum profit measure (a cost-sensitive classification performance measure considering data imbalance) and minimizing the number of selected features.
Multiobjective feature selection for key quality characteristic identification in production processes using a nondominated-sorting-based whale optimization algorithm
2020, Computers and Industrial EngineeringCitation Excerpt :The main consideration using these measures is giving the minority class instances a higher weight while measuring the classification performance. Huang et al. (2010), Pacheco et al. (2013) and Tan et al. (2014) adopted TPR and TNR (Type I and II errors) to form the multiobjective FS problems on unbalanced data, where NSGA-II (Huang et al., 2010; Pacheco et al., 2013) and the modified micro-genetic algorithm (MmGA) (Tan et al., 2014) are used as the optimizers. Ekbal and Saha (2012) proposed a NSGA-II based FS method using precision and recall measures. de la Hoz et al. (2014)
Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method
2019, European Journal of Operational ResearchCitation Excerpt :Therefore, metrics that can comprehensively measure the ability of a feature subset to classify the products in both the positive class and the negative class are needed. Recently, the metrics sensitivity and specificity or Type I and Type II errors have been adopted instead of accuracy to measure the importance of a feature subset (García-Nieto, Alba, Jourdan, & Talbi, 2009; Nag & Pal, 2016; Pacheco, Casado, Angel-Bello, & Álvarez, 2013; Tan, Lim, & Cheah, 2014). Sensitivity and specificity (or Type I and Type II errors) measure the classification performance of positive instances and negative instances, respectively; thus, feature subsets with excellent classification ability are identified in the case of imbalanced data.