Multi-grained and multi-layered gradient boosting decision tree for credit scoring

Liu, Wan’an; Fan, Hong; Xia, Min

doi:10.1007/s10489-021-02715-6

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

Published: 10 August 2021

Volume 52, pages 5325–5341, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

800 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Credit scoring is an important process for banks and financial institutions to manage credit risk. Tree-based ensemble algorithms have made promising progress in credit scoring. However, tree-based ensemble algorithms lack representation learning, making them cannot well express the potential distribution of loan data. In this study, we propose a multi-grained and multi-layered gradient boosting decision tree (GBDT) for credit scoring. Multi-layered GBDT considers the advantages of the explicit learning process of tree-based model and the representation learning ability to discriminate good/bad applicants; multi-grained scanning augments original credit features while enhancing the representation learning ability of multi-layered GBDT. The experimental results on 6 credit scoring datasets show that the hierarchical structure can effectively reduce the intra-class distance and increase the inter-class distance of the credit scoring dataset. In addition, Multi-grained feature augmentation effectively increases the diversity of prediction and further improves the performance of credit scoring, providing more precise credit scoring results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review on Random Forest: An Ensemble Classifier

Machine learning-driven credit risk: a systemic review

Article Open access 16 July 2022

Machine learning techniques for credit risk evaluation: a systematic literature review

Article 01 April 2020

References

Serrano-Cinca C, Gutiérrez-Nieto B (2016) The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (p2p) lending. Decis Support Syst 89:113–122
Article Google Scholar
Kleinbaum DG, Dietz K, Gail M, Klein M, Klei M (2002) Logistic regression. Springer, Berlin
Google Scholar
Louzada F, Ara A, Fernandes GB (2016) Classification methods applied to credit scoring: Systematic review and overall comparison. Surv Oper Res Manag Sci 21(2):117–134
MathSciNet Google Scholar
Li Z, Ye T, Ke L i, Zhou F, Yang W (2017) Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst Appl 74:105–114
Article Google Scholar
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with dte-sbd: Decision tree ensemble based on smote and bagging with differentiated sampling rates. Inf Sci 425:76–91
Article MathSciNet Google Scholar
He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring Adaption of different imbalance ratios. Expert Syst Appl 98:105–117
Article Google Scholar
Jiang C, Wang Z, Wang R, Ding Y (2018) Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Ann Oper Res 266(1-2):511–529
Article MathSciNet Google Scholar
Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42 (2):741–750
Article Google Scholar
Segatori A, Marcelloni F, Pedrycz W (2017) On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 26(1):174–192
Article Google Scholar
Olson DL, Delen D, Meng Y (2012) Comparative analysis of data mining methods for bankruptcy prediction. Decis Support Syst 52(2):464–473
Article Google Scholar
Kim J-Y, Cho S-B (2018) Deep dense convolutional networks for repayment prediction in peer-to-peer lending. In: The 13th international conference on soft computing models in industrial and environmental applications. Springer, pp 134–144
Zhao Z, Xu S, Kang BH, Kabir MMJ, Liu Y, Wasinger R (2015) Investigation and improvement of multi-layer perceptron neural networks for credit scoring. Expert Syst Appl 42(7):3508–3516
Article Google Scholar
Bastani K, Asgari E, Namavari H (2019) Wide and deep learning for peer-to-peer lending. Expert Syst Appl 134:209–224
Article Google Scholar
Teles G, Rodrigues JJPC, Saleem K, Kozlov S, Rabêlo RAL (2020) Machine learning and decision support system on credit scoring. Neural Comput Applic 32(14):9809–9826
Article Google Scholar
Kvamme H, Sellereite N, Aas K, Sjursen S (2018) Predicting mortgage default using convolutional neural networks. Expert Syst Appl 102:207–217
Article Google Scholar
Bequé A, Lessmann S (2017) Extreme learning machines for credit scoring An empirical evaluation. Expert Syst Appl 86:42– 53
Article Google Scholar
Rajendra Acharya U, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M, Ru ST (2019) Deep convolutional neural network for the automated diagnosis of congestive heart failure using ecg signals. Appl Intell 49(1):16–27
Article Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160
Zhou Z-H (2009) Ensemble learning. Encycloped Biomet 1:270–273
Article Google Scholar
Hung C, Chen J-H (2009) A selective ensemble based on expected probabilities for bankruptcy prediction. Expert Syst Appl 36(3):5297–5303
Article Google Scholar
Wang G, Ma J, Huang L, Xu K (2012) Two credit scoring models based on dual strategy ensemble trees. Knowl-Based Syst 26:61–68
Article Google Scholar
Ala’raj M, Abbod MF (2016) Classifiers consensus system approach for credit scoring. Knowl-Based Syst 104:89–105
Article Google Scholar
Xiao H, Xiao Z, Yu W (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86
Article Google Scholar
Zikeba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101
Article Google Scholar
Xia Y, Liu C, Li YY, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241
Article Google Scholar
Fitzpatrick T, Mues C (2016) An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market. Eur J Oper Res 249(2):427–439
Article MathSciNet Google Scholar
Xia Y, He L, Li Y, Liu N, Ding Y (2020) Predicting loan default in peer-to-peer lending using narrative data. J Forecast 39(2):260–280
Article MathSciNet Google Scholar
Liu W, Fan H, Xia M (2021) Step-wise multi-grained augmented gradient boosting decision trees for credit scoring. Eng Appl Artif Intell 97:104036
Article Google Scholar
Zhou Z-H, Feng J (2017) Deep forest. arXiv:1702.08835
Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034
Barış S (2020) Identifying us business cycle regimes using dynamic factors and neural network models. J Forecast 39(5):827–840
Article MathSciNet Google Scholar
Li Z, Cheng H, Guo H (2017) General recurrent neural network for solving generalized linear matrix equation. Complexity 2017
Pang M, Ting K-M, Zhao P, Zhou Z-H (2018) Improving deep forest by confidence screening. In: 2018 IEEE International conference on data mining (ICDM). IEEE, pp 1194–1199
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet Google Scholar
Ji F, Yu Y, Zhou Z-H (2018) Multi-layered gradient boosting decision trees. In: Advances in neural information processing systems, pp 3551–3561
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158
Article MathSciNet Google Scholar
Xia M, Xu Y, Wang K, Zhang X et al (2019) Dilated residual attention network for load disaggregation. Neural Comput Applic 31(12):8931–8953
Article Google Scholar
Xia M, Zhang X, Weng L, Xu Y et al (2020) Multi-stage feature constraints learning for age estimation. IEEE Trans Inform Forens Secur 15:2417–2428
Article Google Scholar
Frank A, Asuncion A (2010) Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california. School of information and computer science, 213(11)
Bahnsen AC, Aouada D, Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. In: 2014 13th international conference on machine learning and applications. IEEE, pp 263–269
Lessmann S, Baesens B, Seow H-V, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring An update of research. Eur J Oper Res 247(1):124–136
Article Google Scholar
Ke G, Qi M, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30:3146–3154
Google Scholar
Tannor P, Rokach L (2019) Augboost: Gradient boosting enhanced with step-wise feature augmentation. In: IJCAI, pp 3555–3561
Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Article MathSciNet Google Scholar

Download references

Acknowledgment

This document is the results of the research project funded by the National Natural Science Foundation of China (71971054), and the Natural Science Foundation of Shanghai (19ZR1402100).

Author information

Authors and Affiliations

Glorious Sun School of Business and Management, Donghua University, Shanghai, 200051, China
Wan’an Liu & Hong Fan
School of automation, Nanjing University of Science Information & Technology, Nanjing, 210044, China
Min Xia

Authors

Wan’an Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Min Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Fan.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Fan, H. & Xia, M. Multi-grained and multi-layered gradient boosting decision tree for credit scoring. Appl Intell 52, 5325–5341 (2022). https://doi.org/10.1007/s10489-021-02715-6

Download citation

Accepted: 17 July 2021
Published: 10 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02715-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

Abstract

Access this article

Similar content being viewed by others

A Review on Random Forest: An Ensemble Classifier

Machine learning-driven credit risk: a systemic review

Machine learning techniques for credit risk evaluation: a systematic literature review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

Abstract

Access this article

Similar content being viewed by others

A Review on Random Forest: An Ensemble Classifier

Machine learning-driven credit risk: a systemic review

Machine learning techniques for credit risk evaluation: a systematic literature review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation