Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

Sivasankar, E.; Selvi, C.; Mahalakshmi, S.

doi:10.1007/s00500-019-04167-0

Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

Methodologies and Application
Published: 30 July 2019

Volume 24, pages 3975–3988, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

E. Sivasankar¹,
C. Selvi² &
S. Mahalakshmi³

724 Accesses
18 Citations
Explore all metrics

Abstract

With the tremendous development of financial institutions, credit risk prediction (CRP) plays an essential role in granting loans to customers and helps them to minimize their loss because credit approval sometimes results in massive financial loss. So extra attention is needed to identify risky customer. Researchers have designed complex CRP models using artificial intelligence (AI) and statistical techniques to support the financial institutions to take correct business decisions. Though there are various statistical and AI methods available, the recent literature shows that the ensemble-based CRP model provides improved prediction results than single classifier system. The small increase in the performance of CRP model could result in a significant improvement in the profit of financial institutions and banks. This work proposes a weight-adjusted boosting ensemble method (WABEM) using rough set (RS)-based feature selection (FS) technique with the balancing and regression-based preprocessing called RS\(\_\)RFS-WABEM. Regression is used to fill missing value in the records to improve the performance of CRP. Three credit datasets (Australia, German and Japanese) are chosen to validate the feasibility and effectiveness of the proposed ensemble method. The trade-off between the uncertainty and imprecise probability of the proposed classifier model is evaluated using the performance measures such as accuracy and area under the curve. Experimental results show that the proposed ensemble method performs better than other base and ensemble classifier methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Article 01 October 2021

Credit Rating Analysis by the Decision-Tree Support Vector Machine with Ensemble Strategies

Article 18 July 2015

A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

Article Open access 01 February 2024

References

Abellán J, Mantas CJ (2014) Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 41(8):3825–3830
Google Scholar
Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837
Google Scholar
Alfaro E, García N, Gámez M, Elizondo D (2008) Bankruptcy forecasting: an empirical comparison of adaboost and neural networks. Decis Support Syst 45(1):110–122
Google Scholar
Antunes F, Ribeiro B, Pereira F (2017) Probabilistic modeling and visualization for bankruptcy prediction. Appl Soft Comput 60:831–843
Google Scholar
Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635
MATH Google Scholar
Bequé A, Lessmann S (2017) Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst Appl 86:42–53
Google Scholar
Bian S, Wang W (2007) On diversity and accuracy of homogeneous and heterogeneous ensembles. Int J Hybrid Intell Syst 4(2):103–128
MATH Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1):245–271
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Chen Y-C (2001) A study on the quality of credit granting in leasing: fuzzy set theory approach. Soft Comput 5(3):229–236
MATH Google Scholar
Dataset 1. http://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening
Dataset 2. https://archive.ics.uci.edu/ml/datasets/Credit+Approval
Dataset 3. https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
Deligianni D, Kotsiantis S (2012) Forecasting corporate bankruptcy with an ensemble of classifiers. In: Artificial intelligence: theories and applications. Springer, pp 65–72
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
MathSciNet MATH Google Scholar
Desai VS, Crook JN, Overstreet GA (1996) A comparison of neural networks and linear scoring models in the credit union environment. Eur J Oper Res 95(1):24–37
MATH Google Scholar
Ditterrich TG (1997) Machine learning research: four current direction. Artif Intell Mag 4:97–136
Google Scholar
Fazayeli F, Wang L, Mandziuk J (2008) Feature selection based on the rough set theory and EM clustering algorithm
Finlay S (2011) Multiple classifier architectures and their application to credit risk assessment. Eur J Oper Res 210(2):368–378
Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
MathSciNet MATH Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Google Scholar
Hsieh N-C, Hung L-P (2010) A data driven ensemble classifier for credit scoring analysis. Expert Syst Appl 37(1):534–545
MathSciNet Google Scholar
Huang C-L, Chen M-C, Wang C-J (2007) Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl 33(4):847–856
Google Scholar
Jiang Y (2009) Credit scoring model based on the decision tree and the simulated annealing algorithm. In: WRI world congress on computer science and information engineering, 2009, vol 4. IEEE, pp 18–22
Karels GV, Prakash AJ (1987) Multivariate normality and forecasting of business bankruptcy. J Bus Finance Account 14(4):573–593
Google Scholar
Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34(2):167–175
Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, London
MATH Google Scholar
Lean Y, Yao X (2013) A total least squares proximal support vector classifier for credit risk evaluation. Soft Comput 17(4):643–650
Google Scholar
Lean Y, Wang S, Lai KK (2008) Credit risk assessment with a multistage neural network ensemble learning approach. Expert Syst Appl 34(2):1434–1444
Google Scholar
Lessmann S, Baesens B, Seow H-V, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136
MATH Google Scholar
Liang D, Tsai C-F, Hsin-Ting W (2015) The effect of feature selection on financial distress prediction. Knowl Based Syst 73:289–297
Google Scholar
Lin W-Y, Ya-Han H, Tsai C-F (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):421–436
Google Scholar
Liu H, Motoda H (2012) Feature selection for knowledge discovery and data mining, vol 454. Springer, Berlin
MATH Google Scholar
Marqués AI, García V, Sánchez JS (2012) Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Syst Appl 39(11):10244–10250
Google Scholar
Nanni L, Lumini A (2009) An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl 36(2):3028–3033
Google Scholar
Pal R, Kupka K, Aneja AP, Militky J (2016) Business health characterization: a hybrid regression and support vector machine analysis. Expert Syst Appl 49:48–59
Google Scholar
Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356
MATH Google Scholar
Schebesch KB, Stecking R (2005) Support vector machines for classifying and describing credit applicants: detecting typical and critical regions. J Oper Res Soc 56(9):1082–1088
MATH Google Scholar
Shin K, Han I (2001) A case-based approach using inductive indexing for corporate bond rating. Decis Support Syst 32(1):41–52
Google Scholar
Sivasankar E, Selvi C, Mala C (2017) A study of dimensionality reduction techniques with machine learning methods for credit risk prediction. In: Computational intelligence in data mining. Springer, pp 65–76
Sun J, Li H (2012) Financial distress prediction using support vector machines: ensemble vs. individual. Appl Soft Comput 12(8):2254–2265
Google Scholar
Sun J, Li H, Huang Q-H, He K-Y (2014) Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl Based Syst 57:41–56
Google Scholar
Tam KY, Kiang MY (1992) Managerial applications of neural networks: the case of bank failure predictions. Manag Sci 38(7):926–947
MATH Google Scholar
Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16(2):149–172
Google Scholar
Tsai C-F (2014) Combining cluster analysis with classifier ensembles to predict financial distress. Inf Fusion 16:46–58
Google Scholar
Tsai C-F, Jhen-Wei W (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649
Google Scholar
Tsai C-F, Hsu Y-F, Yen DC (2014) A comparative study of classifier ensembles for bankruptcy prediction. Appl Soft Comput 24:977–984
Google Scholar
Verikas A, Kalsyte Z, Bacauskiene M, Gelzinis A (2010) Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Comput 14(9):995–1010
Google Scholar
Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230
Google Scholar
Wang G, Ma J, Yang S (2014) An improved boosting based on feature selection for corporate bankruptcy prediction. Expert Syst Appl 41(5):2353–2361
Google Scholar
West D (2000) Neural network credit scoring models. Comput Oper Res 27(11):1131–1152
MATH Google Scholar
Xiao J, Xie L, He C, Jiang X (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675
Google Scholar
Zhang Z, He J, Gao G, Tian Y (2019) Sparse multi-criteria optimization classifier for credit risk evaluation. Soft Comput 23(9):3053–3066
MATH Google Scholar
Zhou L (2013) Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods. Knowl Based Syst 41:16–25
Google Scholar
Zhou L, Lai KK, Lean Y (2009) Credit scoring using support vector machines with direct search for parameters selection. Soft Comput Fusion Found Methodol Appl 13(2):149–155
MATH Google Scholar
Zhou L, Lai KK, Yen J (2014) Bankruptcy prediction using svm models with a new approach to combine features selection and parameter optimisation. Int J Syst Sci 45(3):241–253
MathSciNet MATH Google Scholar
Zhou L, Dong L, Fujita H (2015) The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches. Knowl Based Syst 85:52–61
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India
E. Sivasankar
Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India
C. Selvi
Adobe Systems, Noida, India
S. Mahalakshmi

Authors

E. Sivasankar
View author publications
You can also search for this author in PubMed Google Scholar
C. Selvi
View author publications
You can also search for this author in PubMed Google Scholar
S. Mahalakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Selvi.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sivasankar, E., Selvi, C. & Mahalakshmi, S. Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method. Soft Comput 24, 3975–3988 (2020). https://doi.org/10.1007/s00500-019-04167-0

Download citation

Published: 30 July 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00500-019-04167-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

Abstract

Access this article

Similar content being viewed by others

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Credit Rating Analysis by the Decision-Tree Support Vector Machine with Ensemble Strategies

A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

Abstract

Access this article

Similar content being viewed by others

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Credit Rating Analysis by the Decision-Tree Support Vector Machine with Ensemble Strategies

A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation