Abstract
Financial institutions have long sought to reduce the risk of consumer loans by improving their credit assessment methods. As new information and network technologies enable massive data collections from many different sources, credit assessment has become a challenge in the big data environment. Complicated processing is required to deal with vast, messy data sources and ever-changing loan regulations. This study proposes a decision tree credit assessment approach (DTCAA) to solve the credit assessment problem in a big data environment. Decision tree models offer good interpretability and easily understood rules, with competitive performance capabilities. In addition, DTCAA features various data consolidation methods to eliminate some of the noise in raw data and facilitate the construction of decision tree. By using a large volume data set from one of the biggest car collateral loan companies in Taiwan, this study verifies the efficiency and validity of DTCAA. The results indicate that DTCAA is competitive in various situations and across multiple factors, in support of the applicability of DTCAA to credit assessment practices.
Similar content being viewed by others
References
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cao Y, Rockett PI (2015) The use of vicinal-risk minimization for training decision trees. Appl Soft Comput 31:185–195
Chang N, Sheng ORL (2008) Decision-tree-based knowledge discovery: single- vs. multi-decision-tree induction. INFORMS J Comput 20(1):46–54
Chen FL, Li FC (2010) Combination of feature selection approaches with SVM in credit scoring. Expert Syst Appl 37(7):4902–4909
Chen YL, Wu CC, Tang K (2009) Building a cost-constrained decision tree with multiple condition attributes. Inf Sci 179(7):967–979
Chung SH, Suh YM (2009) Estimating the utility value of individual credit card delinquents. Expert Syst Appl 36(2):3975–3981
Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 241–256
Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2011) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158
Englund C, Verikas A (2012) A novel approach to estimate proximity in a random forest: an exploratory study. Expert Syst Appl 39(17):13046–13050
Feng XD, Xiao Z, Zhong B, Qiu J, Dong YX (2018) Dynamic ensemble classification for credit scoring using soft probability. Appl Soft Comput 65:139–151
Finlay S (2011) Multiple classifier architectures and their application to credit risk assessment. Eur J Oper Res 210(2):368–378
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsl 11(1):10–18
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression. Wiley, New York
Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 6(4):720–747
Khashman A (2011) Credit risk evaluation using neural networks: emotional versus conventional models. Appl Soft Comput 11:5477–5484
Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consumer Serv 27:11–23
Lee TS, Chiu CC, Chou YC, Lu CJ (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130
Mahmood AM, Gudapati P, Kavuluru VG, Kuppa MR (2010) A new pruning approach for better and compact decision trees. Int J Comput Sci Eng 2(8):2551–2558
McAfee A, Brynjolfsson E (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–66
Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15273–15285
Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 38(2):221–254
Ordonez C, Zhao K (2011) Evaluating association rules and decision trees to predict multiple target attributes. Intell Data Anal 15(2):173–192
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Purohit S, Kulkarni A (2011) Credit evaluation model of loan proposals for Indian banks. World Congr Inf Commun Technol 2011:868–873
Rahmani M, Hashemi S, Hamzeh A, Sami A (2009) Agent based decision tree learning: a novel approach. Int J Softw Eng Knowl Eng 19(7):1015–1022
Sahin Y, Duman E (2011) Detecting credit card fraud by decision trees and support vector machines. In: Proceedings of the international multi-conference of engineers and computer scientists
Siddiqi N (2005) Credit risk scorecards: developing and implementing intelligent credit scoring. Wiley, New York
Sohn SY, Kim DH, Yoon JH (2016) Technology credit scoring model with fuzzy logistic regression. Appl Soft Comput 46:150–158
Wang G, Ma J, Huang L, Xu K (2012) Two credit scoring models based on dual strategy ensemble trees. Knowl Based Syst 26(1):61–68
Wang HY, Liao C, Kao CH (2013) A credit assessment mechanism for wireless telecommunication debt collection: an empirical study. Inf Syst e-Business Manag 11(3):357–375
Xiao HS, Xiao Z, Wang Y (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86
Yap BW, Ong SH, Husain NHM (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38(10):13274–13283
Acknowledgments
This research was sponsored by the Ministry of Science and Technology in Taiwan, under project number MOST 103-2410-H-002-099-MY3.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chern, CC., Lei, WU., Huang, KL. et al. A decision tree classifier for credit assessment problems in big data environments. Inf Syst E-Bus Manage 19, 363–386 (2021). https://doi.org/10.1007/s10257-021-00511-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10257-021-00511-w