Skip to main content
Log in

Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

With the tremendous development of financial institutions, credit risk prediction (CRP) plays an essential role in granting loans to customers and helps them to minimize their loss because credit approval sometimes results in massive financial loss. So extra attention is needed to identify risky customer. Researchers have designed complex CRP models using artificial intelligence (AI) and statistical techniques to support the financial institutions to take correct business decisions. Though there are various statistical and AI methods available, the recent literature shows that the ensemble-based CRP model provides improved prediction results than single classifier system. The small increase in the performance of CRP model could result in a significant improvement in the profit of financial institutions and banks. This work proposes a weight-adjusted boosting ensemble method (WABEM) using rough set (RS)-based feature selection (FS) technique with the balancing and regression-based preprocessing called RS\(\_\)RFS-WABEM. Regression is used to fill missing value in the records to improve the performance of CRP. Three credit datasets (Australia, German and Japanese) are chosen to validate the feasibility and effectiveness of the proposed ensemble method. The trade-off between the uncertainty and imprecise probability of the proposed classifier model is evaluated using the performance measures such as accuracy and area under the curve. Experimental results show that the proposed ensemble method performs better than other base and ensemble classifier methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830

    Google Scholar 

  • Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837

    Google Scholar 

  • Alfaro E, García N, Gámez M, Elizondo D (2008) Bankruptcy forecasting: an empirical comparison of adaboost and neural networks. Decis Support Syst 45(1):110–122

    Google Scholar 

  • Antunes F, Ribeiro B, Pereira F (2017) Probabilistic modeling and visualization for bankruptcy prediction. Appl Soft Comput 60:831–843

    Google Scholar 

  • Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635

    MATH  Google Scholar 

  • Bequé A, Lessmann S (2017) Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst Appl 86:42–53

    Google Scholar 

  • Bian S, Wang W (2007) On diversity and accuracy of homogeneous and heterogeneous ensembles. Int J Hybrid Intell Syst 4(2):103–128

    MATH  Google Scholar 

  • Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1):245–271

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Chen Y-C (2001) A study on the quality of credit granting in leasing: fuzzy set theory approach. Soft Comput 5(3):229–236

    MATH  Google Scholar 

  • Dataset 1. http://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening

  • Dataset 2. https://archive.ics.uci.edu/ml/datasets/Credit+Approval

  • Dataset 3. https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

  • Deligianni D, Kotsiantis S (2012) Forecasting corporate bankruptcy with an ensemble of classifiers. In: Artificial intelligence: theories and applications. Springer, pp 65–72

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    MathSciNet  MATH  Google Scholar 

  • Desai VS, Crook JN, Overstreet GA (1996) A comparison of neural networks and linear scoring models in the credit union environment. Eur J Oper Res 95(1):24–37

    MATH  Google Scholar 

  • Ditterrich TG (1997) Machine learning research: four current direction. Artif Intell Mag 4:97–136

    Google Scholar 

  • Fazayeli F, Wang L, Mandziuk J (2008) Feature selection based on the rough set theory and EM clustering algorithm

  • Finlay S (2011) Multiple classifier architectures and their application to credit risk assessment. Eur J Oper Res 210(2):368–378

    Google Scholar 

  • Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67

    MathSciNet  MATH  Google Scholar 

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Google Scholar 

  • Hsieh N-C, Hung L-P (2010) A data driven ensemble classifier for credit scoring analysis. Expert Syst Appl 37(1):534–545

    MathSciNet  Google Scholar 

  • Huang C-L, Chen M-C, Wang C-J (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856

    Google Scholar 

  • Jiang Y (2009) Credit scoring model based on the decision tree and the simulated annealing algorithm. In: WRI world congress on computer science and information engineering, 2009, vol 4. IEEE, pp 18–22

  • Karels GV, Prakash AJ (1987) Multivariate normality and forecasting of business bankruptcy. J Bus Finance Account 14(4):573–593

    Google Scholar 

  • Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34(2):167–175

    Google Scholar 

  • Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, London

    MATH  Google Scholar 

  • Lean Y, Yao X (2013) A total least squares proximal support vector classifier for credit risk evaluation. Soft Comput 17(4):643–650

    Google Scholar 

  • Lean Y, Wang S, Lai KK (2008) Credit risk assessment with a multistage neural network ensemble learning approach. Expert Syst Appl 34(2):1434–1444

    Google Scholar 

  • Lessmann S, Baesens B, Seow H-V, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136

    MATH  Google Scholar 

  • Liang D, Tsai C-F, Hsin-Ting W (2015) The effect of feature selection on financial distress prediction. Knowl Based Syst 73:289–297

    Google Scholar 

  • Lin W-Y, Ya-Han H, Tsai C-F (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):421–436

    Google Scholar 

  • Liu H, Motoda H (2012) Feature selection for knowledge discovery and data mining, vol 454. Springer, Berlin

    MATH  Google Scholar 

  • Marqués AI, García V, Sánchez JS (2012) Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Syst Appl 39(11):10244–10250

    Google Scholar 

  • Nanni L, Lumini A (2009) An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 36(2):3028–3033

    Google Scholar 

  • Pal R, Kupka K, Aneja AP, Militky J (2016) Business health characterization: a hybrid regression and support vector machine analysis. Expert Syst Appl 49:48–59

    Google Scholar 

  • Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356

    MATH  Google Scholar 

  • Schebesch KB, Stecking R (2005) Support vector machines for classifying and describing credit applicants: detecting typical and critical regions. J Oper Res Soc 56(9):1082–1088

    MATH  Google Scholar 

  • Shin K, Han I (2001) A case-based approach using inductive indexing for corporate bond rating. Decis Support Syst 32(1):41–52

    Google Scholar 

  • Sivasankar E, Selvi C, Mala C (2017) A study of dimensionality reduction techniques with machine learning methods for credit risk prediction. In: Computational intelligence in data mining. Springer, pp 65–76

  • Sun J, Li H (2012) Financial distress prediction using support vector machines: ensemble vs. individual. Appl Soft Comput 12(8):2254–2265

    Google Scholar 

  • Sun J, Li H, Huang Q-H, He K-Y (2014) Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl Based Syst 57:41–56

    Google Scholar 

  • Tam KY, Kiang MY (1992) Managerial applications of neural networks: the case of bank failure predictions. Manag Sci 38(7):926–947

    MATH  Google Scholar 

  • Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16(2):149–172

    Google Scholar 

  • Tsai C-F (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inf Fusion 16:46–58

    Google Scholar 

  • Tsai C-F, Jhen-Wei W (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649

    Google Scholar 

  • Tsai C-F, Hsu Y-F, Yen DC (2014) A comparative study of classifier ensembles for bankruptcy prediction. Appl Soft Comput 24:977–984

    Google Scholar 

  • Verikas A, Kalsyte Z, Bacauskiene M, Gelzinis A (2010) Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Comput 14(9):995–1010

    Google Scholar 

  • Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230

    Google Scholar 

  • Wang G, Ma J, Yang S (2014) An improved boosting based on feature selection for corporate bankruptcy prediction. Expert Syst Appl 41(5):2353–2361

    Google Scholar 

  • West D (2000) Neural network credit scoring models. Comput Oper Res 27(11):1131–1152

    MATH  Google Scholar 

  • Xiao J, Xie L, He C, Jiang X (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675

    Google Scholar 

  • Zhang Z, He J, Gao G, Tian Y (2019) Sparse multi-criteria optimization classifier for credit risk evaluation. Soft Comput 23(9):3053–3066

    MATH  Google Scholar 

  • Zhou L (2013) Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods. Knowl Based Syst 41:16–25

    Google Scholar 

  • Zhou L, Lai KK, Lean Y (2009) Credit scoring using support vector machines with direct search for parameters selection. Soft Comput Fusion Found Methodol Appl 13(2):149–155

    MATH  Google Scholar 

  • Zhou L, Lai KK, Yen J (2014) Bankruptcy prediction using svm models with a new approach to combine features selection and parameter optimisation. Int J Syst Sci 45(3):241–253

    MathSciNet  MATH  Google Scholar 

  • Zhou L, Dong L, Fujita H (2015) The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches. Knowl Based Syst 85:52–61

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Selvi.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sivasankar, E., Selvi, C. & Mahalakshmi, S. Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method. Soft Comput 24, 3975–3988 (2020). https://doi.org/10.1007/s00500-019-04167-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04167-0

Keywords

Navigation