Abstract
We evaluate several approaches to classification of loan applications that provide their final results in the form of a single decision tree, i.e., in the form widely regarded as interpretable by humans. We apply state-of-the-art credit scoring-oriented classification algorithms, such as logistic regression, gradient boosting decision trees and random forests, as components of the proposed algorithms of decision tree building. We use four real-world loan default prediction data sets of different sizes. We evaluate the proposed methods using the area under the receiver operating characteristic curve (AUC) but we also measure the models’ interpretability. We verify the significance of differences between AUC values observed when using the compared techniques by measuring Friedman’s statistic and performing Nemenyi’s post-hoc test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baesens, B., Setiono, R., Mues, C., Vanthienen, J.: Using neural network rule extraction and decision tables for credit-risk evaluation. Manage. Sci. 49(3), 312–329 (2003). https://doi.org/10.1287/mnsc.49.3.312.12739
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012). https://doi.org/10.1016/j.eswa.2011.09.033
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548
Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998). http://dl.acm.org/citation.cfm?id=2639331.2639334
Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)
Goodman, B., Flaxman, S.: European Union regulations on algorithmic decision-making and a “right to explanation”, June 2016. arXiv e-prints arXiv:1606.08813
Harris, T.: Credit scoring using the clustered support vector machine. Expert Syst. Appl. 42(2), 741–750 (2015). https://doi.org/10.1016/j.eswa.2014.08.029
Kaggle: Default of Credit Card Clients Dataset (2017). https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
Kaggle: Give Me Some Credit dataset (2017). https://www.kaggle.com/c/GiveMeSomeCredit
Kaggle: Lending Club Loan dataset (2017). https://www.kaggle.com/wendykan/lending-club-loan-data
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Li, Z., Tian, Y., Li, K., Zhou, F., Yang, W.: Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst. Appl. 74(C), 105–114 (2017). https://doi.org/10.1016/j.eswa.2017.01.011
Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015). https://doi.org/10.1016/j.eswa.2015.02.001
Martens, D., Huysmans, J., Setiono, R., Vanthienen, J., Baesens, B.: Rule extraction from support vector machines: an overview of issues and application in credit scoring. In: Diederich, J. (ed.) Rule Extraction from Support Vector Machines. SCI, pp. 33–63. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-75390-2_2
Martens, D., Vanthienen, J., Verbeke, W., Baesens, B.: Performance of classification models from a user perspective. Decis. Support Syst. 51(4), 782–793 (2011). https://doi.org/10.1016/j.dss.2011.01.013
Scikit-learn: Machine Learning in Python. (2017). http://scikit-learn.org/stable/index.html
Serrano-Cinca, C., Gutiérrez-Nieto, B.N.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89(C), 113–122 (2016). https://doi.org/10.1016/j.dss.2016.06.014
Szwabe, A.: Kernel and acquisition function setup for Bayesian optimization of gradient boosting hyperparameters. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10751, pp. 297–306. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75417-8_28
Szwabe, A., Misiorek, P., Ciesielczyk, M.: Tensor-based modeling of temporal features for big data CTR estimation. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 16–27. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58274-0_2
Szwabe, A., Misiorek, P., Walkowiak, P.: Reflective relational learning for ontology alignment. In: Omatu, S., De Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 519–526. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28765-7_62
University of California, Irvine (UCI), Machine Learning Repository: German Credit dataset (2017). https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
Wang, H., Xu, Q., Zhou, L.: Large unbalanced credit scoring using lasso-logistic regression ensemble. PloS one 10(2), e0117844 (2015)
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009). https://doi.org/10.1016/j.eswa.2007.12.020
Zieba, M., Hardle, W.K.: Beta-boosted ensemble for big credit scoring data. SFB 649 Discussion Paper 2016-052, SSRN, November 2016. https://doi.org/10.2139/ssrn.2875664, https://ssrn.com/abstract=2875664
Acknowledgments
This work was supported by the Polish National Science Centre, grant DEC-2011/01/D/ST6/06788, and by Poznan University of Technology under grant 04/45/DSPB/0185.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Szwabe, A., Misiorek, P. (2018). Decision Trees as Interpretable Bank Credit Scoring Models. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-99987-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)