Abstract
Effective and thorough credit-risk management is a key factor for lending institutions, as significant financial losses can arise from the borrowers’ default. Consequently, machine learning methods can measure and analyze credit risk objectively when at the same time they face increasingly attention. This study analyzes default payment data from a credit cards’ portfolio containing some 30,000 clients from Taiwan with twenty-three attributes and with no missing information. We compare prediction accuracy of seven classification methods used, i.e. KNN, Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, SVC, and Linear SVC. The results indicate that only few out of most of the typical variables used can adequately analyze default characteristics in terms of lending decisions. The results provide effective feedback to credit evaluators, lending institutions and business analysts for in-depth analysis. Also, they mention to the importance of the precautionary borrowing techniques to be used to better understand credit-card borrowers’ behavior, along with specific accounting, historical and demographical characteristics.
Similar content being viewed by others
Data availability
The data set is based on the publicly available credit card default data set from the UCI Machine Learning Repository. Details are here: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.
References
Aha, D. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man–Machine Studies, 36(2), 267–287.
Ajay, V., & Shomona, G. J. (2016). Prediction of credit-card defaulters: a comparative study on performance of classifiers. International Journal of Computer Applications (0975–8887), 145(7), 36–41.
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of Machine Learning Research, 2, 125–137.
Bhaduri, A. (2009). Credit scoring using artificial immune system algorithms: a comparative study. In Proceedings of the world congress on nature and biologically inspired computing NaBIC2009, Coimbatore (pp. 1540–1543).
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Cheng D., Zhang S., Deng Z., Zhu Y., & Zong M. (2014). kNN algorithm with data-driven k value. In: Luo X., Yu J. X., & Li Z. (Eds.), Advanced data mining and applications. ADMA 2014. Lecture Notes in Computer Science (Vol. 8933). Berlin: Springer.
Davis, R. H., Edelman, D. B., & Gammerman, A. J. (1992). Machine-learning algorithms for credit-card applications. Journal of Management Mathematics, 4(1), 43–51.
Dimitras, A., Papadakis, S., & Garefalakis, A. (2017). Evaluation of empirical attributes for credit risk forecasting from numerical data. Investment Management and Financial Innovations, 14(1), 9–18. https://doi.org/10.21511/imfi.14(1).2017.01.
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning, Madison, WI. San Francisco: Morgan Kaufmann (pp. 144–151).
Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. In L. de Raedt, & P. A. Flach (Eds.), Proceedings of the twelfth European conference on machine learning, Freiburg, Germany. Berlin: Springer (pp. 145–156).
Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, Y. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012.
Hand, D. J., & Henley, W. E. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, 45(1), 77–95.
He, J., Liu, X., Shi, Y., Xu, W., & Yan, N. (2004). Classifications of credit cardholder behavior by using fuzzy linear programming. International Journal of Information Technology and Decision Making, 3(4), 633–650.
Jenhani, I., Nahla, B. A., & Ziedm, E. (2008). Decision trees as possibilistic classifiers (Special Section on Choquet Integration in honor of Gustave Choquet (1915–2006) and Special Section on Nonmonotonic and Uncertain Reasoning). International Journal of Approximate Reasoning, 48(3), 784–807.
Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit risk models via machine-learning Algorithms. AFA 2011 Denver Meetings Paper. https://doi.org/10.2139/ssrn.1568864.
Krichene, A. (2017). Using a naive Bayesian classifier methodology for loan risk assessment evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24.
Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Proceedings of the fourteenth European conference on machine learning, Cavtat-Dubrovnik, Croatia. Berlin: Springer (pp. 241–252).
Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics and Data Analysis, 50, 1113–1130.
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine: University of California, School of Information and Computer Science. The original dataset can be found at the UCI Machine Learning Repository, i.e. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
Makalic, E., & Schmidt, D. F. (2010). Review of modern logistic regression methods with application to small and medium sample size problems. In Li, J. (Eds.), AI 2010: advances in artificial intelligence. AI 2010. Lecture Notes in Computer Science, (Vol. 6464). Berlin: Springer.
Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony and particle swarm optimization for financial classification problems. Expert Systems with Applications, 36, 10604–10611.
Neema, S., & Soibam, B. (2017). The comparison of machine learning methods to achieve most cost-effective prediction for credit card default. Journal of Management Science and Business Intelligence, 2(2), 36–41.
Peng, Y., Kou, G., Chen, Z., & Shi, Y. (2004). Cross-validation and ensemble analyses on multiple-criteria linear programming classification for credit cardholder behavior, Lecture Notes in Computer Science, ICCS 2004 (Vol. 3039, pp. 931–939).
Quinlan, J., Rajendra, G., & Castro, D. (1998). Bank collateralised loan obligations: From 0 to 60 in less than 2 years? Merrill Lynch, Global Securities Research & Economics Group, March.
Ramoni, M., & Sebastiani, P. (2001). Robust Bayes classifiers. Artificial Intelligence, 125(1–2), 209–226.
Shen, A., Tong, R., & Deng, Y. (2007). Application of classification models on credit card fraud detection. In Proceedings of the international conference on service systems and service management, Chengdu (pp. 1–4).
Shi, Y., Peng, Y., Kou, G., & Chen, Z. (2005). Classifying credit card accounts for business intelligence and decision making: A multiple-criteria quadratic programming approach. International Journal of Information Technology and Decision Making, 4(4), 581–599.
Shomona, J. G., & Ramani, R. G. (2011). Discovery of knowledge patterns in clinical data through data mining algorithms: Multi-class categorization of breast tissue data. International Journal of Computer Applications, 32(7), 46–53.
Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification procedures. The Journal of Finance, 42(3), 665–681.
Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36, 111–147.
Watanabe, C. Y. V., Ribeiro, M. X., Traina, C., & Traina, A. J. M. (2011). SACMiner: A new classification method based on statistical association rules to mine medical images. In: J. Filipe, & J. Cordeiro (Eds.), Enterprise information systems. ICEIS 2010. Lecture Notes in Business Information Processing (Vol. 73). Berlin: Springer.
Yeh, I.-C., & Lien, C. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2, Part 1), 2473–2480.
Acknowledgements
The current publication is based on the following dataset: Lichman (Lichman 2013). We would also like to thank the Laboratory of Artificial Intelligence Systems and Computer Architectures of the Technological Educational Institute of Crete for providing the computer power to complete extensive experimental results for the needs of this work
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sariannidis, N., Papadakis, S., Garefalakis, A. et al. Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: decision making based on machine learning (ML) techniques. Ann Oper Res 294, 715–739 (2020). https://doi.org/10.1007/s10479-019-03188-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-019-03188-0