Skip to main content
Log in

Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: decision making based on machine learning (ML) techniques

  • S.I.: BALCOR-2017
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Effective and thorough credit-risk management is a key factor for lending institutions, as significant financial losses can arise from the borrowers’ default. Consequently, machine learning methods can measure and analyze credit risk objectively when at the same time they face increasingly attention. This study analyzes default payment data from a credit cards’ portfolio containing some 30,000 clients from Taiwan with twenty-three attributes and with no missing information. We compare prediction accuracy of seven classification methods used, i.e. KNN, Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, SVC, and Linear SVC. The results indicate that only few out of most of the typical variables used can adequately analyze default characteristics in terms of lending decisions. The results provide effective feedback to credit evaluators, lending institutions and business analysts for in-depth analysis. Also, they mention to the importance of the precautionary borrowing techniques to be used to better understand credit-card borrowers’ behavior, along with specific accounting, historical and demographical characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data set is based on the publicly available credit card default data set from the UCI Machine Learning Repository. Details are here: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

References

  • Aha, D. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man–Machine Studies, 36(2), 267–287.

    Article  Google Scholar 

  • Ajay, V., & Shomona, G. J. (2016). Prediction of credit-card defaulters: a comparative study on performance of classifiers. International Journal of Computer Applications (0975–8887), 145(7), 36–41.

    Article  Google Scholar 

  • Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of Machine Learning Research, 2, 125–137.

    Google Scholar 

  • Bhaduri, A. (2009). Credit scoring using artificial immune system algorithms: a comparative study. In Proceedings of the world congress on nature and biologically inspired computing NaBIC2009, Coimbatore (pp. 1540–1543).

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Cheng D., Zhang S., Deng Z., Zhu Y., & Zong M. (2014). kNN algorithm with data-driven k value. In: Luo X., Yu J. X., & Li Z. (Eds.), Advanced data mining and applications. ADMA 2014. Lecture Notes in Computer Science (Vol. 8933). Berlin: Springer.

  • Davis, R. H., Edelman, D. B., & Gammerman, A. J. (1992). Machine-learning algorithms for credit-card applications. Journal of Management Mathematics, 4(1), 43–51.

    Google Scholar 

  • Dimitras, A., Papadakis, S., & Garefalakis, A. (2017). Evaluation of empirical attributes for credit risk forecasting from numerical data. Investment Management and Financial Innovations, 14(1), 9–18. https://doi.org/10.21511/imfi.14(1).2017.01.

    Article  Google Scholar 

  • Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning, Madison, WI. San Francisco: Morgan Kaufmann (pp. 144–151).

  • Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. In L. de Raedt, & P. A. Flach (Eds.), Proceedings of the twelfth European conference on machine learning, Freiburg, Germany. Berlin: Springer (pp. 145–156).

  • Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, Y. (2018). Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012.

    Article  Google Scholar 

  • Hand, D. J., & Henley, W. E. (1996). A k-nearest-neighbour classifier for assessing consumer credit risk. The Statistician, 45(1), 77–95.

    Article  Google Scholar 

  • He, J., Liu, X., Shi, Y., Xu, W., & Yan, N. (2004). Classifications of credit cardholder behavior by using fuzzy linear programming. International Journal of Information Technology and Decision Making, 3(4), 633–650.

    Article  Google Scholar 

  • Jenhani, I., Nahla, B. A., & Ziedm, E. (2008). Decision trees as possibilistic classifiers (Special Section on Choquet Integration in honor of Gustave Choquet (1915–2006) and Special Section on Nonmonotonic and Uncertain Reasoning). International Journal of Approximate Reasoning, 48(3), 784–807.

    Article  Google Scholar 

  • Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit risk models via machine-learning Algorithms. AFA 2011 Denver Meetings Paper. https://doi.org/10.2139/ssrn.1568864.

  • Krichene, A. (2017). Using a naive Bayesian classifier methodology for loan risk assessment evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24.

    Article  Google Scholar 

  • Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Proceedings of the fourteenth European conference on machine learning, Cavtat-Dubrovnik, Croatia. Berlin: Springer (pp. 241–252).

  • Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics and Data Analysis, 50, 1113–1130.

    Article  Google Scholar 

  • Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine: University of California, School of Information and Computer Science. The original dataset can be found at the UCI Machine Learning Repository, i.e. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

  • Makalic, E., & Schmidt, D. F. (2010). Review of modern logistic regression methods with application to small and medium sample size problems. In Li, J. (Eds.), AI 2010: advances in artificial intelligence. AI 2010. Lecture Notes in Computer Science, (Vol. 6464). Berlin: Springer.

  • Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony and particle swarm optimization for financial classification problems. Expert Systems with Applications, 36, 10604–10611.

    Article  Google Scholar 

  • Neema, S., & Soibam, B. (2017). The comparison of machine learning methods to achieve most cost-effective prediction for credit card default. Journal of Management Science and Business Intelligence, 2(2), 36–41.

    Google Scholar 

  • Peng, Y., Kou, G., Chen, Z., & Shi, Y. (2004). Cross-validation and ensemble analyses on multiple-criteria linear programming classification for credit cardholder behavior, Lecture Notes in Computer Science, ICCS 2004 (Vol. 3039, pp. 931–939).

  • Quinlan, J., Rajendra, G., & Castro, D. (1998). Bank collateralised loan obligations: From 0 to 60 in less than 2 years? Merrill Lynch, Global Securities Research & Economics Group, March.

  • Ramoni, M., & Sebastiani, P. (2001). Robust Bayes classifiers. Artificial Intelligence, 125(1–2), 209–226.

    Article  Google Scholar 

  • Shen, A., Tong, R., & Deng, Y. (2007). Application of classification models on credit card fraud detection. In Proceedings of the international conference on service systems and service management, Chengdu (pp. 1–4).

  • Shi, Y., Peng, Y., Kou, G., & Chen, Z. (2005). Classifying credit card accounts for business intelligence and decision making: A multiple-criteria quadratic programming approach. International Journal of Information Technology and Decision Making, 4(4), 581–599.

    Article  Google Scholar 

  • Shomona, J. G., & Ramani, R. G. (2011). Discovery of knowledge patterns in clinical data through data mining algorithms: Multi-class categorization of breast tissue data. International Journal of Computer Applications, 32(7), 46–53.

    Google Scholar 

  • Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification procedures. The Journal of Finance, 42(3), 665–681.

    Article  Google Scholar 

  • Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36, 111–147.

    Google Scholar 

  • Watanabe, C. Y. V., Ribeiro, M. X., Traina, C., & Traina, A. J. M. (2011). SACMiner: A new classification method based on statistical association rules to mine medical images. In: J. Filipe, & J. Cordeiro (Eds.), Enterprise information systems. ICEIS 2010. Lecture Notes in Business Information Processing (Vol. 73). Berlin: Springer.

  • Yeh, I.-C., & Lien, C. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2, Part 1), 2473–2480.

Download references

Acknowledgements

The current publication is based on the following dataset: Lichman (Lichman 2013). We would also like to thank the Laboratory of Artificial Intelligence Systems and Computer Architectures of the Technological Educational Institute of Crete for providing the computer power to complete extensive experimental results for the needs of this work

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Sariannidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sariannidis, N., Papadakis, S., Garefalakis, A. et al. Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: decision making based on machine learning (ML) techniques. Ann Oper Res 294, 715–739 (2020). https://doi.org/10.1007/s10479-019-03188-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-019-03188-0

Keywords

Navigation