Abstract
The aim of this work is to assess the importance of independence assumption in behavioral scorings created using logistic regression. We develop four sampling methods that control which observations associated to each client are to be included in the training set, avoiding a functional dependence between observations of the same client. We then calibrate logistic regressions with variable selection on the samples created by each method, plus one using all the data in the training set (biased base method), and validate the models on an independent data set. We find that the regression built using all the observations shows the highest area under the ROC curve and Kolmogorv–Smirnov statistics, while the regression that uses the least amount of observations shows the lowest performance and highest variance of these indicators. Nevertheless, the fourth selection algorithm presented shows almost the same performance as the base method using just 14 % of the dataset, and 14 less variables. We conclude that violating the independence assumption does not impact strongly on results and, furthermore, trying to control it by using less data can harm the performance of calibrated models, although a better sampling method does lead to equivalent results with a far smaller dataset needed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Archer, K. J., Lemeshow, S., & Hosmer, D. W. (2007). Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design. Computational Statistics & Data Analysis, 51, 4450–4464.
Basel committee on banking supervision (2006). Basel II: International convergence of capital measurement and capital standards: A revised framework—comprehensive version. http://www.bis.org/publ/bcbsca.htm. Accessed 15 October 2011.
Hosmer, D., & Lemeshow, H. (2000). Applied logistic regression. New York: Wiley.
Medema, L., Koning, R. H., & Lensink, R. (2007). A practical approach to validating a PD model. Journal of Banking and Finance, 33, 701–708.
Thomas, L. C., Crook, J. N., & Edelman, D. B. (2002). Credit scoring and its applications. Philadelphia: SIAM.
Verstraeten, G., & van der Poel, D. (2005). The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society, 56, 981–992.
White, H., & Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica, 52, 143–162.
Acknowledgements
The work reported in this paper has been partially funded by the Finance Center, DII, Universidad de Chile, with the support of bank Bci.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Biron, M., Bravo, C. (2014). On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-01595-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)