Skip to main content
Log in

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Credit scoring is an important process for banks and financial institutions to manage credit risk. Tree-based ensemble algorithms have made promising progress in credit scoring. However, tree-based ensemble algorithms lack representation learning, making them cannot well express the potential distribution of loan data. In this study, we propose a multi-grained and multi-layered gradient boosting decision tree (GBDT) for credit scoring. Multi-layered GBDT considers the advantages of the explicit learning process of tree-based model and the representation learning ability to discriminate good/bad applicants; multi-grained scanning augments original credit features while enhancing the representation learning ability of multi-layered GBDT. The experimental results on 6 credit scoring datasets show that the hierarchical structure can effectively reduce the intra-class distance and increase the inter-class distance of the credit scoring dataset. In addition, Multi-grained feature augmentation effectively increases the diversity of prediction and further improves the performance of credit scoring, providing more precise credit scoring results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Serrano-Cinca C, Gutiérrez-Nieto B (2016) The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (p2p) lending. Decis Support Syst 89:113–122

    Article  Google Scholar 

  2. Kleinbaum DG, Dietz K, Gail M, Klein M, Klei M (2002) Logistic regression. Springer, Berlin

    Google Scholar 

  3. Louzada F, Ara A, Fernandes GB (2016) Classification methods applied to credit scoring: Systematic review and overall comparison. Surv Oper Res Manag Sci 21(2):117–134

    MathSciNet  Google Scholar 

  4. Li Z, Ye T, Ke L i, Zhou F, Yang W (2017) Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst Appl 74:105–114

    Article  Google Scholar 

  5. Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with dte-sbd: Decision tree ensemble based on smote and bagging with differentiated sampling rates. Inf Sci 425:76–91

    Article  MathSciNet  Google Scholar 

  6. He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring Adaption of different imbalance ratios. Expert Syst Appl 98:105–117

    Article  Google Scholar 

  7. Jiang C, Wang Z, Wang R, Ding Y (2018) Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Ann Oper Res 266(1-2):511–529

    Article  MathSciNet  Google Scholar 

  8. Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42 (2):741–750

    Article  Google Scholar 

  9. Segatori A, Marcelloni F, Pedrycz W (2017) On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 26(1):174–192

    Article  Google Scholar 

  10. Olson DL, Delen D, Meng Y (2012) Comparative analysis of data mining methods for bankruptcy prediction. Decis Support Syst 52(2):464–473

    Article  Google Scholar 

  11. Kim J-Y, Cho S-B (2018) Deep dense convolutional networks for repayment prediction in peer-to-peer lending. In: The 13th international conference on soft computing models in industrial and environmental applications. Springer, pp 134–144

  12. Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516

    Article  Google Scholar 

  13. Bastani K, Asgari E, Namavari H (2019) Wide and deep learning for peer-to-peer lending. Expert Syst Appl 134:209–224

    Article  Google Scholar 

  14. Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Applic 32(14):9809–9826

    Article  Google Scholar 

  15. Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217

    Article  Google Scholar 

  16. Bequé A, Lessmann S (2017) Extreme learning machines for credit scoring An empirical evaluation. Expert Syst Appl 86:42– 53

    Article  Google Scholar 

  17. Rajendra Acharya U, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M, Ru ST (2019) Deep convolutional neural network for the automated diagnosis of congestive heart failure using ecg signals. Appl Intell 49(1):16–27

    Article  Google Scholar 

  18. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160

  19. Zhou Z-H (2009) Ensemble learning. Encycloped Biomet 1:270–273

    Article  Google Scholar 

  20. Hung C, Chen J-H (2009) A selective ensemble based on expected probabilities for bankruptcy prediction. Expert Syst Appl 36(3):5297–5303

    Article  Google Scholar 

  21. Wang G, Ma J, Huang L, Xu K (2012) Two credit scoring models based on dual strategy ensemble trees. Knowl-Based Syst 26:61–68

    Article  Google Scholar 

  22. Ala’raj M, Abbod MF (2016) Classifiers consensus system approach for credit scoring. Knowl-Based Syst 104:89–105

    Article  Google Scholar 

  23. Xiao H, Xiao Z, Yu W (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86

    Article  Google Scholar 

  24. Zikeba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101

    Article  Google Scholar 

  25. Xia Y, Liu C, Li YY, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241

    Article  Google Scholar 

  26. Fitzpatrick T, Mues C (2016) An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market. Eur J Oper Res 249(2):427–439

    Article  MathSciNet  Google Scholar 

  27. Xia Y, He L, Li Y, Liu N, Ding Y (2020) Predicting loan default in peer-to-peer lending using narrative data. J Forecast 39(2):260–280

    Article  MathSciNet  Google Scholar 

  28. Liu W, Fan H, Xia M (2021) Step-wise multi-grained augmented gradient boosting decision trees for credit scoring. Eng Appl Artif Intell 97:104036

    Article  Google Scholar 

  29. Zhou Z-H, Feng J (2017) Deep forest. arXiv:1702.08835

  30. Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  31. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034

  32. Barış S (2020) Identifying us business cycle regimes using dynamic factors and neural network models. J Forecast 39(5):827–840

    Article  MathSciNet  Google Scholar 

  33. Li Z, Cheng H, Guo H (2017) General recurrent neural network for solving generalized linear matrix equation. Complexity 2017

  34. Pang M, Ting K-M, Zhao P, Zhou Z-H (2018) Improving deep forest by confidence screening. In: 2018 IEEE International conference on data mining (ICDM). IEEE, pp 1194–1199

  35. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  Google Scholar 

  36. Ji F, Yu Y, Zhou Z-H (2018) Multi-layered gradient boosting decision trees. In: Advances in neural information processing systems, pp 3551–3561

  37. Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158

    Article  MathSciNet  Google Scholar 

  38. Xia M, Xu Y, Wang K, Zhang X et al (2019) Dilated residual attention network for load disaggregation. Neural Comput Applic 31(12):8931–8953

    Article  Google Scholar 

  39. Xia M, Zhang X, Weng L, Xu Y et al (2020) Multi-stage feature constraints learning for age estimation. IEEE Trans Inform Forens Secur 15:2417–2428

    Article  Google Scholar 

  40. Frank A, Asuncion A (2010) Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california. School of information and computer science, 213(11)

  41. Bahnsen AC, Aouada D, Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. In: 2014 13th international conference on machine learning and applications. IEEE, pp 263–269

  42. Lessmann S, Baesens B, Seow H-V, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring An update of research. Eur J Oper Res 247(1):124–136

    Article  Google Scholar 

  43. Ke G, Qi M, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30:3146–3154

    Google Scholar 

  44. Tannor P, Rokach L (2019) Augboost: Gradient boosting enhanced with step-wise feature augmentation. In: IJCAI, pp 3555–3561

  45. Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

This document is the results of the research project funded by the National Natural Science Foundation of China (71971054), and the Natural Science Foundation of Shanghai (19ZR1402100).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Fan.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Fan, H. & Xia, M. Multi-grained and multi-layered gradient boosting decision tree for credit scoring. Appl Intell 52, 5325–5341 (2022). https://doi.org/10.1007/s10489-021-02715-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02715-6

Keywords

Navigation