On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples

Biron, Miguel; Bravo, Cristián

doi:10.1007/978-3-319-01595-8_27

Miguel Biron²¹ &
Cristián Bravo²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

5303 Accesses
2 Citations

Abstract

The aim of this work is to assess the importance of independence assumption in behavioral scorings created using logistic regression. We develop four sampling methods that control which observations associated to each client are to be included in the training set, avoiding a functional dependence between observations of the same client. We then calibrate logistic regressions with variable selection on the samples created by each method, plus one using all the data in the training set (biased base method), and validate the models on an independent data set. We find that the regression built using all the observations shows the highest area under the ROC curve and Kolmogorv–Smirnov statistics, while the regression that uses the least amount of observations shows the lowest performance and highest variance of these indicators. Nevertheless, the fourth selection algorithm presented shows almost the same performance as the base method using just 14 % of the dataset, and 14 less variables. We conclude that violating the independence assumption does not impact strongly on results and, furthermore, trying to control it by using less data can harm the performance of calibrated models, although a better sampling method does lead to equivalent results with a far smaller dataset needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Variable Selection in Binary Logistic Regression for Modelling Bankruptcy Risk

Object selection in credit scoring using covariance matrix of parameters estimations

Article 09 February 2017

Logistic Classification for New Policyholders Taking into Account Prediction Error

References

Archer, K. J., Lemeshow, S., & Hosmer, D. W. (2007). Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design. Computational Statistics & Data Analysis, 51, 4450–4464.
Article MathSciNet MATH Google Scholar
Basel committee on banking supervision (2006). Basel II: International convergence of capital measurement and capital standards: A revised framework—comprehensive version. http://www.bis.org/publ/bcbsca.htm. Accessed 15 October 2011.
Hosmer, D., & Lemeshow, H. (2000). Applied logistic regression. New York: Wiley.
Book MATH Google Scholar
Medema, L., Koning, R. H., & Lensink, R. (2007). A practical approach to validating a PD model. Journal of Banking and Finance, 33, 701–708.
Article Google Scholar
Thomas, L. C., Crook, J. N., & Edelman, D. B. (2002). Credit scoring and its applications. Philadelphia: SIAM.
Book MATH Google Scholar
Verstraeten, G., & van der Poel, D. (2005). The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society, 56, 981–992.
Article MATH Google Scholar
White, H., & Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica, 52, 143–162.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work reported in this paper has been partially funded by the Finance Center, DII, Universidad de Chile, with the support of bank Bci.

Author information

Authors and Affiliations

Department of Industrial Engineering, Universidad de Chile., República 701, 8370439, Santiago, Chile
Miguel Biron
Finance Center, Department of Industrial Engineering, Universidad de Chile., Domeyko 2369, 8370397, Santiago, Chile
Cristián Bravo

Authors

Miguel Biron
View author publications
You can also search for this author in PubMed Google Scholar
Cristián Bravo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Biron .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Myra Spiliopoulou
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Lars Schmidt-Thieme
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Ruth Janning

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biron, M., Bravo, C. (2014). On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-01595-8_27
Published: 10 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples

Abstract

Access this chapter

Similar content being viewed by others

Variable Selection in Binary Logistic Regression for Modelling Bankruptcy Risk

Object selection in credit scoring using covariance matrix of parameters estimations

Logistic Classification for New Policyholders Taking into Account Prediction Error

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On the Discriminative Power of Credit Scoring Systems Trained on Independent Samples

Abstract

Access this chapter

Similar content being viewed by others

Variable Selection in Binary Logistic Regression for Modelling Bankruptcy Risk

Object selection in credit scoring using covariance matrix of parameters estimations

Logistic Classification for New Policyholders Taking into Account Prediction Error

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation