Abstract
Credit scoring is an important process for banks and financial institutions to manage credit risk. Tree-based ensemble algorithms have made promising progress in credit scoring. However, tree-based ensemble algorithms lack representation learning, making them cannot well express the potential distribution of loan data. In this study, we propose a multi-grained and multi-layered gradient boosting decision tree (GBDT) for credit scoring. Multi-layered GBDT considers the advantages of the explicit learning process of tree-based model and the representation learning ability to discriminate good/bad applicants; multi-grained scanning augments original credit features while enhancing the representation learning ability of multi-layered GBDT. The experimental results on 6 credit scoring datasets show that the hierarchical structure can effectively reduce the intra-class distance and increase the inter-class distance of the credit scoring dataset. In addition, Multi-grained feature augmentation effectively increases the diversity of prediction and further improves the performance of credit scoring, providing more precise credit scoring results.
Similar content being viewed by others
References
Serrano-Cinca C, Gutiérrez-Nieto B (2016) The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (p2p) lending. Decis Support Syst 89:113–122
Kleinbaum DG, Dietz K, Gail M, Klein M, Klei M (2002) Logistic regression. Springer, Berlin
Louzada F, Ara A, Fernandes GB (2016) Classification methods applied to credit scoring: Systematic review and overall comparison. Surv Oper Res Manag Sci 21(2):117–134
Li Z, Ye T, Ke L i, Zhou F, Yang W (2017) Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst Appl 74:105–114
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with dte-sbd: Decision tree ensemble based on smote and bagging with differentiated sampling rates. Inf Sci 425:76–91
He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring Adaption of different imbalance ratios. Expert Syst Appl 98:105–117
Jiang C, Wang Z, Wang R, Ding Y (2018) Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Ann Oper Res 266(1-2):511–529
Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42 (2):741–750
Segatori A, Marcelloni F, Pedrycz W (2017) On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 26(1):174–192
Olson DL, Delen D, Meng Y (2012) Comparative analysis of data mining methods for bankruptcy prediction. Decis Support Syst 52(2):464–473
Kim J-Y, Cho S-B (2018) Deep dense convolutional networks for repayment prediction in peer-to-peer lending. In: The 13th international conference on soft computing models in industrial and environmental applications. Springer, pp 134–144
Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516
Bastani K, Asgari E, Namavari H (2019) Wide and deep learning for peer-to-peer lending. Expert Syst Appl 134:209–224
Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Applic 32(14):9809–9826
Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217
Bequé A, Lessmann S (2017) Extreme learning machines for credit scoring An empirical evaluation. Expert Syst Appl 86:42– 53
Rajendra Acharya U, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M, Ru ST (2019) Deep convolutional neural network for the automated diagnosis of congestive heart failure using ecg signals. Appl Intell 49(1):16–27
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160
Zhou Z-H (2009) Ensemble learning. Encycloped Biomet 1:270–273
Hung C, Chen J-H (2009) A selective ensemble based on expected probabilities for bankruptcy prediction. Expert Syst Appl 36(3):5297–5303
Wang G, Ma J, Huang L, Xu K (2012) Two credit scoring models based on dual strategy ensemble trees. Knowl-Based Syst 26:61–68
Ala’raj M, Abbod MF (2016) Classifiers consensus system approach for credit scoring. Knowl-Based Syst 104:89–105
Xiao H, Xiao Z, Yu W (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86
Zikeba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101
Xia Y, Liu C, Li YY, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241
Fitzpatrick T, Mues C (2016) An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market. Eur J Oper Res 249(2):427–439
Xia Y, He L, Li Y, Liu N, Ding Y (2020) Predicting loan default in peer-to-peer lending using narrative data. J Forecast 39(2):260–280
Liu W, Fan H, Xia M (2021) Step-wise multi-grained augmented gradient boosting decision trees for credit scoring. Eng Appl Artif Intell 97:104036
Zhou Z-H, Feng J (2017) Deep forest. arXiv:1702.08835
Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034
Barış S (2020) Identifying us business cycle regimes using dynamic factors and neural network models. J Forecast 39(5):827–840
Li Z, Cheng H, Guo H (2017) General recurrent neural network for solving generalized linear matrix equation. Complexity 2017
Pang M, Ting K-M, Zhao P, Zhou Z-H (2018) Improving deep forest by confidence screening. In: 2018 IEEE International conference on data mining (ICDM). IEEE, pp 1194–1199
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Ji F, Yu Y, Zhou Z-H (2018) Multi-layered gradient boosting decision trees. In: Advances in neural information processing systems, pp 3551–3561
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158
Xia M, Xu Y, Wang K, Zhang X et al (2019) Dilated residual attention network for load disaggregation. Neural Comput Applic 31(12):8931–8953
Xia M, Zhang X, Weng L, Xu Y et al (2020) Multi-stage feature constraints learning for age estimation. IEEE Trans Inform Forens Secur 15:2417–2428
Frank A, Asuncion A (2010) Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california. School of information and computer science, 213(11)
Bahnsen AC, Aouada D, Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. In: 2014 13th international conference on machine learning and applications. IEEE, pp 263–269
Lessmann S, Baesens B, Seow H-V, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring An update of research. Eur J Oper Res 247(1):124–136
Ke G, Qi M, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30:3146–3154
Tannor P, Rokach L (2019) Augboost: Gradient boosting enhanced with step-wise feature augmentation. In: IJCAI, pp 3555–3561
Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Acknowledgment
This document is the results of the research project funded by the National Natural Science Foundation of China (71971054), and the Natural Science Foundation of Shanghai (19ZR1402100).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, W., Fan, H. & Xia, M. Multi-grained and multi-layered gradient boosting decision tree for credit scoring. Appl Intell 52, 5325–5341 (2022). https://doi.org/10.1007/s10489-021-02715-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02715-6