Skip to main content
Log in

A decision tree classifier for credit assessment problems in big data environments

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

Financial institutions have long sought to reduce the risk of consumer loans by improving their credit assessment methods. As new information and network technologies enable massive data collections from many different sources, credit assessment has become a challenge in the big data environment. Complicated processing is required to deal with vast, messy data sources and ever-changing loan regulations. This study proposes a decision tree credit assessment approach (DTCAA) to solve the credit assessment problem in a big data environment. Decision tree models offer good interpretability and easily understood rules, with competitive performance capabilities. In addition, DTCAA features various data consolidation methods to eliminate some of the noise in raw data and facilitate the construction of decision tree. By using a large volume data set from one of the biggest car collateral loan companies in Taiwan, this study verifies the efficiency and validity of DTCAA. The results indicate that DTCAA is competitive in various situations and across multiple factors, in support of the applicability of DTCAA to credit assessment practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cao Y, Rockett PI (2015) The use of vicinal-risk minimization for training decision trees. Appl Soft Comput 31:185–195

    Article  Google Scholar 

  • Chang N, Sheng ORL (2008) Decision-tree-based knowledge discovery: single- vs. multi-decision-tree induction. INFORMS J Comput 20(1):46–54

    Article  Google Scholar 

  • Chen FL, Li FC (2010) Combination of feature selection approaches with SVM in credit scoring. Expert Syst Appl 37(7):4902–4909

    Article  Google Scholar 

  • Chen YL, Wu CC, Tang K (2009) Building a cost-constrained decision tree with multiple condition attributes. Inf Sci 179(7):967–979

    Article  Google Scholar 

  • Chung SH, Suh YM (2009) Estimating the utility value of individual credit card delinquents. Expert Syst Appl 36(2):3975–3981

    Article  Google Scholar 

  • Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 241–256

  • Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2011) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158

    Article  Google Scholar 

  • Englund C, Verikas A (2012) A novel approach to estimate proximity in a random forest: an exploratory study. Expert Syst Appl 39(17):13046–13050

    Article  Google Scholar 

  • Feng XD, Xiao Z, Zhong B, Qiu J, Dong YX (2018) Dynamic ensemble classification for credit scoring using soft probability. Appl Soft Comput 65:139–151

    Article  Google Scholar 

  • Finlay S (2011) Multiple classifier architectures and their application to credit risk assessment. Eur J Oper Res 210(2):368–378

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsl 11(1):10–18

    Article  Google Scholar 

  • Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression. Wiley, New York

    Book  Google Scholar 

  • Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 6(4):720–747

    Article  Google Scholar 

  • Khashman A (2011) Credit risk evaluation using neural networks: emotional versus conventional models. Appl Soft Comput 11:5477–5484

    Article  Google Scholar 

  • Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consumer Serv 27:11–23

    Article  Google Scholar 

  • Lee TS, Chiu CC, Chou YC, Lu CJ (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130

    Article  Google Scholar 

  • Mahmood AM, Gudapati P, Kavuluru VG, Kuppa MR (2010) A new pruning approach for better and compact decision trees. Int J Comput Sci Eng 2(8):2551–2558

    Google Scholar 

  • McAfee A, Brynjolfsson E (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–66

    Google Scholar 

  • Nie G, Rowe W, Zhang L, Tian Y, Shi Y (2011) Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl 38(12):15273–15285

    Article  Google Scholar 

  • Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 38(2):221–254

    Article  Google Scholar 

  • Ordonez C, Zhao K (2011) Evaluating association rules and decision trees to predict multiple target attributes. Intell Data Anal 15(2):173–192

    Article  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Purohit S, Kulkarni A (2011) Credit evaluation model of loan proposals for Indian banks. World Congr Inf Commun Technol 2011:868–873

    Google Scholar 

  • Rahmani M, Hashemi S, Hamzeh A, Sami A (2009) Agent based decision tree learning: a novel approach. Int J Softw Eng Knowl Eng 19(7):1015–1022

    Article  Google Scholar 

  • Sahin Y, Duman E (2011) Detecting credit card fraud by decision trees and support vector machines. In: Proceedings of the international multi-conference of engineers and computer scientists

  • Siddiqi N (2005) Credit risk scorecards: developing and implementing intelligent credit scoring. Wiley, New York

    Google Scholar 

  • Sohn SY, Kim DH, Yoon JH (2016) Technology credit scoring model with fuzzy logistic regression. Appl Soft Comput 46:150–158

    Article  Google Scholar 

  • Wang G, Ma J, Huang L, Xu K (2012) Two credit scoring models based on dual strategy ensemble trees. Knowl Based Syst 26(1):61–68

    Article  Google Scholar 

  • Wang HY, Liao C, Kao CH (2013) A credit assessment mechanism for wireless telecommunication debt collection: an empirical study. Inf Syst e-Business Manag 11(3):357–375

    Article  Google Scholar 

  • Xiao HS, Xiao Z, Wang Y (2016) Ensemble classification based on supervised clustering for credit scoring. Appl Soft Comput 43:73–86

    Article  Google Scholar 

  • Yap BW, Ong SH, Husain NHM (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38(10):13274–13283

    Article  Google Scholar 

Download references

Acknowledgments

This research was sponsored by the Ministry of Science and Technology in Taiwan, under project number MOST 103-2410-H-002-099-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwei-Long Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chern, CC., Lei, WU., Huang, KL. et al. A decision tree classifier for credit assessment problems in big data environments. Inf Syst E-Bus Manage 19, 363–386 (2021). https://doi.org/10.1007/s10257-021-00511-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-021-00511-w

Keywords

Navigation