Elsevier

Applied Soft Computing

Volume 19, June 2014, Pages 57-67
Applied Soft Computing

Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine

https://doi.org/10.1016/j.asoc.2014.01.018Get rights and content

Abstract

Support vector machine (SVM) is an effective tool for financial distress identification (FDI). However, a potential issue that keeps SVM from being efficiently applied in identifying financial distress is how to select features in SVM-based FDI. Although filters are commonly employed, yet this type of approach does not consider predictive capability of SVM itself when selecting features. This research devotes to constructing a statistics-based wrapper for SVM-based FDI by using statistical indices of ranking-order information from predictive performances on various parameters. This wrapper consists of four levels, i.e., data level, model level based on SVM, feature ranking-order level, and the index level of feature selection. When data is ready, predictive accuracies of a type of SVM model, i.e., linear SVM (LSVM), polynomial SVM (PSVM), Gaussian SVM (GSVM), or sigmoid SVM (SSVM), on various pairs of parameters are firstly calculated. Then, performances of SVM models on each candidate feature are transferred to be ranking-order indices. After this step, the two statistical indices of mean and standard deviation values are calculated from ranking-order information on each feature. Finally, the feature selection indices of SVM are produced by a combination of statistical indices. Each feature with its feature selection index being smaller than half of the average index is selected to compose the optimal feature set. With a dataset collected for Chinese FDI prior to 3 years, we statistically verified the performance of this statistics-based wrapper against a non-statistics-based wrapper, two filters, and non-feature selection for SVM-based FDI. Results from unseen dataset indicate that GSVM with the statistics-based wrapper significantly outperformed the other SVM models on the other feature selection methods and two wrapper-based classical statistical models.

Introduction

Financial distress identification (FDI) is an effective tool of risk management. This area received lots of focuses from academic and industrial views [1], [2], [3], [5], [6], [11], [16], [19], [27], [29], [33], [34], [35], [36], [37], [39], [40], [41], [42], [44], [48]. Identification on whether or not a company will fail helps financial institutions, managers, employees, investors and government officials to control risk in their decisions. Predictive accuracy of the tool is a key index indicating whether it is helpful in real-world life. Basically, a predictive model is assumed to be more useful if it is more accurate. A whole dataset is commonly partitioned into training dataset, validating dataset, and testing dataset. The identification of new problems is simulated by using the model constructed on labelled data to predict unlabeled data. This partition is commonly repeated for lots of times in order to provide statistical analysis on significance. Under this assumption, support vector machine (SVM) is an important technique for FDI for the following two reasons: (1) SVM is constructed from mature statistical learning theory [10], [46]; (2) previous evidence shows that SVM produced dominating predictive performance in FDI [12], [17], [18], [26], [30], [31], [38], [44], [47].

Feature selection is a process that chooses information-rich features and retains the meaning of original features [14], [23]. Filters and wrappers are two chief methods of feature selection. Filters refer to the use of an algorithm to search through the space of possible features and then to evaluate each subset by running a filter function on the subset. The so-called filter function is not the same as the model used for prediction or classification. Thus, the feature selection approach does not consider preference of the model. Wrappers are similar to filters, but evaluate against the current model instead of a filter function.

A common drawback of previous researches of SVM-based FDI is that they used either filters or genetic algorithm to select optimal feature subsets for SVM. Wrappers are supposed to yield a feature subset that helps model produce dominating predictive performance. Greedy hill climbing, which finds the optimal feature subset by iteratively evaluating a candidate subset of features, is commonly used in wrappers. Genetic algorithm belongs to this type. However, the drawback of the use of genetic algorithm in wrappers is that the outputted feature subset is not the same when the approach is implemented several times.

This research attempts to construct a novel stable wrapper for SVM to identify financial distress. Two key issues in application of SVM include kernel selection and parameter optimization. This new wrapper is constructed on the base of each of the following type of SVM, including: linear SVM (LSVM), polynomial SVM (PSVM), Gaussian SVM (GSVM) and sigmoid SVM (SSVM). Lots of SVM models are produced by using various pairs of parameters after kernel function is selected. Predictions of various SVM models on each candidate feature are transferred into ranking-order information of each feature. The two statistical indices of mean and standard deviation computed from the ranking-order information are combined to calculate a feature selection index of SVM. This index is used to select optimal features.

This paper is organized as follows. Section 2 gives a brief review on feature selection and parameter searching methods used in previous researches of SVM-based FDI. Section 3 presents the new wrapper for SVM-based FDI. Section 4 designs an experiment to testify the efficiency and feasibility of the statistics-based wrapper. Section 5 discusses the experimental results. Section 6 makes conclusion.

Section snippets

Feature selection and parameter search in previous researches of SVM-based FDI

When SVM was firstly applied to identify financial distress, Shin et al. [38] employed a two-stage feature selection process, which is composed of t-test and stepwise multivariate discriminant analysis (MDA) in consecutive sequence. This type of feature selection belongs to the family of filters. The comparison on predictive performance between SVM and back-propagation neural networks (NN) indicated that SVM produced more accurate ratios than NN. Gaussian kernel was used and its parameters were

Kernels and parameters of SVM when constructing the approach

Wrapper for SVM evaluates against predictive performance of SVM itself. Kernel functions and parameters of SVM must be set up before constructing a wrapper. There are four commonly used kernel functions for SVM, i.e., linear kernel (u′*v), polynomial kernel ((gamma*u′*v)^p), Gaussian kernel (exp(-gamma*|u-v|^2)) and sigmoid kernel (tanh(gamma*u′*v)) [7], [8]. Each one of the four commonly kernel functions can be employed in the wrapper. The reason why we use the common range of {2−10,2−9,…27,28

Objective, data and variables

The objective of this empirical research is to test effectiveness and feasibility of SVM with the statistics-based wrapper when solving the task of FDI. A pioneer research on the issue of feature selection for SVM was conducted by Chen and Lin [9]. They applied feature selection for SVM on a dataset which has 500 features. Filters were firstly used to filter out only 16 features, which are further used in a wrapper. We used the following procedure, namely: firstly using some filter rules to

Optimal features from the filters and wrappers

Gamma for PSVM was set as the default value of libSVM, since the model could not been trained on the data set when it was larger than 26. Mean and S.D. of ranking orders of all features are listed in Table 3, where ΣFS/2M = 60.08. The index values of feature selection in the statistics-based wrapper are illustrated in Table 3. Meanwhile, features respectively selected by a wrapper integrating forward and backward selection on Mahalanobis distance with SVM [39] with RBF kernel and default

Implication and conclusion

One implication of the research is that it is effective to consider preferences of a predictive model for FDI on various parameters in feature selection. Mean and standard deviation information is derived from ranking-order information of performance of the model with various parameters on each feature. By using the statistics-bases approach, preferences on features are transferred to be feature selection index, which is used to select preferred features. Non-paramedic techniques have

Acknowledgements

This research is partially supported by the National Natural Science Foundation of China (No. 71171179; 71371171), the Zhejiang Provincial National Science Foundation for Distinguished Young Scholars of China (No. LR13G010001), the Zhejiang Provincial National Science Foundation of China (No. LY13G010001), and the Humanities and Social Science Foundation of Ministry of Education of China (no. 13YJC630140). The authors gratefully thank editors and anonymous referees for their comments and

References (48)

  • H. Li et al.

    Predicting business failure using multiple case-based reasoning combined with support vector machine

    Expert Systems with Applications

    (2009)
  • R. Lin et al.

    Developing a business failure prediction model via RST, GRA and CBR

    Expert Systems with Applications

    (2009)
  • D. Martin

    Early warning of bank failure: a logit regression approach

    Journal of Banking and Finance

    (1977)
  • J.-H. Min et al.

    Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters

    Expert Systems with Applications

    (2005)
  • S.-H. Min et al.

    Hybrid genetic algorithms and support vector machines for bankruptcy prediction

    Expert Systems with Applications

    (2006)
  • P.C. Pendharkar

    A threshold-varying artificial neural network approach for classification and its application to bankruptcy prediction problem

    Computers and Operations Research

    (2005)
  • I.M. Premachandra et al.

    DEA as a tool for bankruptcy assessment: a comparative study with logistic regression technique

    European Journal of Operational Research

    (2009)
  • V. Ravi et al.

    Soft computing system for bank performance prediction

    Applied Soft Computing

    (2008)
  • K.-S. Shin et al.

    An application of support vector machines in bankruptcy prediction model

    Expert Systems with Applications

    (2005)
  • J. Sun et al.

    Dynamic financial distress prediction using instance selection for the disposal of concept drift

    Expert Systems with Applications

    (2011)
  • J. Sun et al.

    Financial distress early warning based on group decision making

    Computers and Operations Research

    (2009)
  • L. Sun et al.

    Using Bayesian networks for bankruptcy prediction: some methodological issues

    European Journal of Operational Research

    (2007)
  • C.-F. Tsai et al.

    Using neural network ensembles for bankruptcy prediction and credit scoring

    Expert Systems with Applications

    (2008)
  • F.-M. Tseng et al.

    A quadratic interval logit model for forecasting bankruptcy

    Omega

    (2005)
  • Cited by (52)

    • CatBoost model and artificial intelligence techniques for corporate failure prediction

      2021, Technological Forecasting and Social Change
      Citation Excerpt :

      Huynh (2020a) proposed the perceptron neural network nonlinear Granger causality and transfer entropy to examine the complex causal relationship between precious metals, economic policy uncertainty and the Chicago board exchange volatility index. Among the machine learning models, the SVM has gained wide popularity in bankruptcy prediction (Tsai and Cheng, 2012; Li et al., 2014; Barboza et al., 2017; Erdogan et al., 2019). Specifically, Erdogan (2013) found that a support vector machine with a Gaussian kernel provides useful information from accounting data and an effective warning system for Turkish commercial banks.

    View all citing articles on Scopus
    1

    Young Researcher of World Federation on Soft Computing.

    View full text