Elsevier

Knowledge-Based Systems

Volume 44, May 2013, Pages 57-64
Knowledge-Based Systems

Bi-objective feature selection for discriminant analysis in two-class classification

https://doi.org/10.1016/j.knosys.2013.01.019Get rights and content

Abstract

This works deals with the problem of selecting variables (features) that are subsequently used in discriminant analysis. The aim is to find, from a set of m variables, smaller subsets which enable an efficient classification of cases in two classes. We consider two objectives, each one associated with the misclassification error in each class (type I and type II errors). Thus, we establish a bi-objective problem and develop an algorithm based on the NSGA-II strategy to this specific problem, in order to obtain a set of non-dominated solutions. Managing these two objectives separately (and not jointly) allows an enhanced analysis of the obtained solutions by observing the approach to efficient frontier. This is especially significant when each type of error has a different level of importance or when they cannot be compared. To illustrate these issues, several known databases from literature are used, as well as an additional database with several Spanish firms featured by financial variables and two classes: “creditworthy” and “non-creditworthy”. Finally, we show that when solutions obtained by our NSGA-II implementation are evaluated from the classic mono-objective perspective (minimizing the ratio of both error types jointly) they are better than those obtained by classic methods for feature selection and similar than those provided by other recently published methods.

Introduction

The aim in the classification problem is to classify cases that are characterized by attributes or variables, that is, to determine which class every instance belongs to. Based on a set of examples (whose class is known) a set of rules are designed and generalized to classify the set of instances with the greatest precision possible.

There are several methodologies for dealing with this problem: Classic discriminant analysis, Logistic Regression, Neural Networks, Decision Trees, Instance-Based Learning, etc. Most discriminant analysis methods search for hyperplanes in the variable space that better distinguish the classes from the instances [19]. This translates into searching for linear functions and then using them for classification purposes (Wald, Fisher, etc.). The use of linear functions enables better interpretation of the results (e.g., importance and/or significance of each variable in instance classification) when analyzing the value of the coefficient obtained. Not every classification method is suited to this type of analysis and in fact some are classified as “black box” models. Thus, classic discriminant analysis is still a useful methodology. There are recent applications of discriminant analysis and its variants, see for example the work of Lu et al. [28].

When designing a classification method for a problem which involves many variables, only those that are essential should be selected, and so the first step is to eliminate less significant variables from the analysis. The problem of finding a subset of variables that can be used to carry out the classification task in an optimum way is known as variable selection or feature selection problem. Research into this issue was started in the early 1960s by Lewis [22] and Sebestyen [33]. According to Liu and Motoda [26], feature selection provides some advantages such as reducing the costs of data acquisition, better understanding of the final classification model, and an increase in the efficiency and efficacy of such a model. Extensive research into variable selection has been carried out over the past four decades, many of the publications are related to medicine and biology [35], [13], [20], [21]; [34], [38], [24], [27], [25], [36], [37]. Recently, it has been applied to financial databases, specifically in bankruptcy prediction [39], [5], [23], [40], [7].

Feature selection problems for classification are, by nature, multi-objective optimisation problems, since, even in the simplest case, they involve feature subset size minimization and performance maximization. However, there are very few articles that address them from the multi-objective perspective. Most papers that do, have the goal of minimizing the number of variables and either maximizing performance or minimizing the error [11], [14], [4], [42].

In two-class classification, that is, when items are classified into two categories (class 1 and class 2), two types of errors may occur:

  • Type-I error: misclassifying an item belonging to class 1 as one of class 2.

  • Type-II error: misclassifying an item belonging to class 2 as one of class 1.

In many practical situations, errors of one type may have far more serious consequences than errors of the other type. For example, a preliminary screening test for cancer should not declare a malignant case to be benign and, subject to that condition, should avoid misclassification of benign cases as malignant as much as can be done by the test. Put differently, the probability for classifying a malignant case as benign should be zero or almost zero, and, subject to that condition, the probability of classifying a benign case as malignant should be as small as can be achieved by the test.

A first attempt to handle errors in classification can be found in the work of Felici et al. [12], which describes a general approach that supports tight error control for two-class classification. More recently, Garcı´a-Nieto et al. [15] focus on studying sensitivity and specificity applied to cancer disease. They accomplished the classification task using Support Vector Machine and the multi-objective problem is solved by a genetic algorithm. Huang et al. [18] take into account three objectives: the proportion of the total number of predictions that were correct, the proportion of cases from class 1 that were correctly identified and the proportion of cases from class 2 that were classified correctly. They use a genetic algorithm and the classification is accomplished by decision trees.

Evaluation by classification accuracy tacitly assumes equal error costs, that is, a type-I error is equivalent to a type-II error. However, this rarely happens in the real world because classifications lead to actions which have consequences. Indeed, it is hard to imagine a domain in which a learning system may be indifferent to whether it makes a type I error or type II error.

In this work we focus on the problem of selecting variables for binary classification using discriminant analysis. We address it as a bi-objective problem that minimizes type-I error and minimizes type-II error, with the purpose of explicitly differentiating both errors. To the best of our knowledge, feature selection for discriminant analysis has not been studied from this perspective so far. This study has several important advantages:

  • Commonly both types of errors do not have the same importance; sometimes, they cannot even be compared. In these cases, using an aggregated objective function (as in mono-objective approach) can be non-suitable to compare and to check the convenience of each solution.

  • Using a mono-objective approach a single solution is obtained, while with the bi-objective approach a set of non-dominated solutions is obtained. It allows the possibility of choosing among several options and to analyse what is the best option depending on the context and the circumstances.

  • The bi-objective approach allows some further analysis by observing the approximation to the efficient set. For example: how many can the type-I error be reduced with only a small increase of type-II error or vice versa. This kind of analysis favors a proper decision making.

As solution method we develop an adaptation of the algorithm NSGA-II [9] which is one of the most efficient multi-objective approaches and has been successfully used in data mining applications [8], [1], [17], [3], [16], [2].

In order to illustrate the above mentioned advantages a set of databases from literature has been used to carry out computational experiments and the feature selection problem has been analyzed using both, mono-objective and bi-objective perspectives. One additional database, composed of Spanish firms, has been used in the experiments. It is an interesting example of the “credit scoring” problem and helps to highlight two interesting facts: (1) the importance of both error types is different and (2) the importance of each error type changes depending on the economic environment and other circumstances.

The rest of the paper is organized as follows. In Section 2 we establish the problem while in Section 3 we describe the proposed methodology to solve it. Computational experiments are shown in Section 4 and finally, Section 5 is devoted to our conclusions.

Section snippets

Problem description

The addressed problem can be stated as follows: Let V be a set of m variables, such that V = {v1,v2,  , vm} and A a set of cases, (also named “training” set). For each case it is known which class (1 or 2) it belongs to.

Given a predefined integer value p,p < m, we have to find a subset S  V, with size p minimizing type-I error and type-II error for the discriminant analysis. More precisely, we will consider two objective functions: f1(S) and f2(S) defined as the ratios in A of type-I and type-II

Solution approach: an adaptation of NSGA-II algorithm

NSGA-II is an improvement over NSGA (non-dominated sorting genetic algorithm) that deals with three major drawbacks of the original approach: (1) high computational cost of sorting, (2) lack of elitism and (3) lack of a parameter-free diversity-preservation mechanism [9].

To solve the addressed problem we have developed a procedure based on NSGA-II, which will be referred as NSGAFS (NSGA-II for Feature Selection). Fig. 1 shows the outline of the NSGAFS algorithm.

As it can be observed, the

Computational experiments

To assess the relevance of formulating the feature selection problem for classification as a bi-objective one, as well as to check and compare the efficacy our proposed method NSGAFS, a series of experiments was run with different databases. In the next subsection these databases are described. In Section 4.2 a comparison of our NSGAFS with different approaches for the single-objective problem is shown. The results obtained with the proposed method are exposed in Section 4.3. A case study is

Conclusions

This work deals with a problem of feature selection for discriminant analysis in two-class classification. We explicitly consider two objectives: minimizing the ratio of type-I error and minimizing the ratio of type-II error. The problem is then treated as a bi-objective optimization problem. In order to obtain an approximation to the efficient curve we propose and implement an adaptation of the NSGA-II algorithm.

The importance of studying this bi-objective problem is given by the fact that

Acknowledgements

This work has been partially supported by the Research Chair in Industrial Engineering of Tecnológico de Monterrey (ITESM Research Fund CAT128), FEDER founds and Spanish Ministry of Science (Project ECO2008-06159/ECON) and Regional Government of “Castilla y León”, Spain (Project BU008A10-2). These supports are gratefully acknowledged.

References (43)

  • J. Pacheco et al.

    A variable selection method based in tabu search for logistic regression models

    European Journal of Operational Research

    (2009)
  • X. Sun et al.

    Feature selection using dynamic weights for classification

    Knowledge Based System

    (2013)
  • C.-F. Tsai

    Feature selection in bankruptcy prediction

    Knowledge-Based Systems

    (2009)
  • C.-F. Tsai et al.

    Simple instance selection for bankruptcy prediction

    Knowledge-Based Systems

    (2012)
  • A. Unler et al.

    A discrete particle swarm optimization method for feature selection in binary classification problems

    European Journal of Operational Research

    (2010)
  • A. Zhou et al.

    Multiobjective evolutionary algorithms: a survey of the state of the art

    Swarm and Evolutionary Computation

    (2011)
  • K. Ahmadian et al.

    A new multi-objective evolutionary approach for creating ensemble of classifiers

    IEEE International Conference on Systems Man and Cybernetics

    (2007)
  • R. Alcala et al.

    A multiobjective evolutionary approach to concurrently learn rule and data bases of linguistic fuzzy-rule-based systems

    IEEE Transactions on Fuzzy Systems

    (2009)
  • P. Baraldi et al.

    Application of a niched Pareto genetic algorithm for selecting features for nuclear transients classification

    International Journal of Intelligent Systems

    (2009)
  • C.A. Coello et al.

    Evolutionary Algorithms for Solving Multi-Objective Problems

    (2002)
  • B. de la Iglesia et al.

    Data mining using multi-objective evolutionary algorithms

    Proceedings of IEEE Congress on Evolutionary Computation

    (2003)
  • Cited by (18)

    • A decomposition-based multi-objective particle swarm optimization algorithm with a local search strategy for key quality characteristic identification in production processes

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      To cope with this data imbalance problem, other classification performance measures are adopted instead of accuracy in the FS methods. For example, Pacheco et al. (2013) proposed a multi-objective FS method that optimizes the type I and type II errors using NSGA-II. Kozodoi et al. (2019) adopted NSGA-II to solve an FS model that is defined as maximizing an expected maximum profit measure (a cost-sensitive classification performance measure considering data imbalance) and minimizing the number of selected features.

    • Multiobjective feature selection for key quality characteristic identification in production processes using a nondominated-sorting-based whale optimization algorithm

      2020, Computers and Industrial Engineering
      Citation Excerpt :

      The main consideration using these measures is giving the minority class instances a higher weight while measuring the classification performance. Huang et al. (2010), Pacheco et al. (2013) and Tan et al. (2014) adopted TPR and TNR (Type I and II errors) to form the multiobjective FS problems on unbalanced data, where NSGA-II (Huang et al., 2010; Pacheco et al., 2013) and the modified micro-genetic algorithm (MmGA) (Tan et al., 2014) are used as the optimizers. Ekbal and Saha (2012) proposed a NSGA-II based FS method using precision and recall measures. de la Hoz et al. (2014)

    • Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method

      2019, European Journal of Operational Research
      Citation Excerpt :

      Therefore, metrics that can comprehensively measure the ability of a feature subset to classify the products in both the positive class and the negative class are needed. Recently, the metrics sensitivity and specificity or Type I and Type II errors have been adopted instead of accuracy to measure the importance of a feature subset (García-Nieto, Alba, Jourdan, & Talbi, 2009; Nag & Pal, 2016; Pacheco, Casado, Angel-Bello, & Álvarez, 2013; Tan, Lim, & Cheah, 2014). Sensitivity and specificity (or Type I and Type II errors) measure the classification performance of positive instances and negative instances, respectively; thus, feature subsets with excellent classification ability are identified in the case of imbalanced data.

    View all citing articles on Scopus
    View full text