Skip to main content
Log in

Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The majority of current credit-scoring models, used for loan approval processing, are generally built on the basis of the information from the accepted credit applicants whose ability to repay the loan is known. This situation generates what is called the selection bias, presented by a sample that is not representative of the population of applicants, since rejected applications are excluded. Thus, the impact on the eligibility of those models from a statistical and economic point of view. Especially for the models used in the peer-to-peer lending platforms, since their rejection rate is extremely high. The method of inferring rejected applicants information in the process of construction of the credit scoring models is known as reject inference. This study proposes a semi-supervised learning framework based on hidden Markov models (SSHMM), as a novel method of reject inference. Real data from the Lending Club platform, the most used online lending marketplace in the United States as well as the rest of the world, is used to experiment the effectiveness of our method over existing approaches. The results of this study clearly illustrate the proposed method’s superiority, stability, and adaptability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Anderson R (2007) The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press, Oxford

    Google Scholar 

  • Anderson B (2019) Using Bayesian networks to perform reject inference. Expert Syst Appl 137:349–356

    Article  Google Scholar 

  • Banasik J, Crook J (2007) Reject inference, augmentation, and sample selection. Eur J Oper Res 183(3):1582–1594

    Article  MATH  Google Scholar 

  • Banasik J, Crook J (2010) Reject inference in survival analysis by augmentation. J Oper Res Soc 61(3):473–485

    Article  Google Scholar 

  • Banasik J, Crook J, Thomas LC (2003) Sample selection bias in credit scoring models. JORS 54(8):822–832

    MATH  Google Scholar 

  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–71

    Article  MathSciNet  MATH  Google Scholar 

  • Bücker M, van Kampen M, Krämer W (2013) Reject inference in consumer credit scoring with nonignorable missing data. J Bank Finance 37(3):1040–1045

    Article  Google Scholar 

  • Chen GG, Astebro T (2001) The economic value of reject inference in credit scoring. Department of Management Science, University of Waterloo, Waterloo

    Google Scholar 

  • Crook J, Banasik J (2004) Does reject inference really improve the performance of application scoring models? J. Bank Finance 28(4):857–874

    Article  Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • El annas M, Ouzineb M, Benyacoub B (2022) Hidden Markov models training using hybrid Baum Welch: variable neighborhood search algorithm. Stat Optim Inf Comput 10(1):160–170

    Article  MathSciNet  Google Scholar 

  • Feelders AJ (1999) Credit scoring and reject inference with mixture models. Intell Syst AccountFinance Manag 8:271–279

    Article  Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  MATH  Google Scholar 

  • García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064

    Article  Google Scholar 

  • https://home.kpmg/xx/en/home/insights/2020/02/pulse-of-fintech-archive.html

  • https://www.lendingclub.com/info/download-data.action

  • Kang Y, Jia N, Cui R, Deng J (2021) A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring. Appl Soft Comput 105:107259

    Article  Google Scholar 

  • Kim A, Cho S-B (2019) An ensemble semi-supervised learning method for predicting defaults in social lending. Eng Appl Artif Intell 81:193–199

    Article  Google Scholar 

  • Kozodoi N, Katsas P, Lessmann S, Moreira-Matias L, Papakonstantinou K (2019). Shallow self-learning for reject inference in credit scoring. In: Joint European conference on machine learning and knowledge discovery in databases, pp 516–532. Springer

  • Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247:124–136

    Article  MATH  Google Scholar 

  • Levinson SE, Rabiner LR, Sondhi MM (1983) An introduction to the application of the theory of probabilistic functions of Markov process to automatic speech recognition. The Bell Syst Tech J 62:1035–74

    Article  MathSciNet  MATH  Google Scholar 

  • Li X, Parizeau M, Plamondon R (2000) Training hidden Markov models with multiple observations-a combinatorial method. IEEE Trans Pattern Anal Mach Intell 22:371–77

    Article  Google Scholar 

  • Li Z, Tian Y, Li K, Zhou F, Yang W (2017) Reject inference in credit scoring using Semi-supervised support vector machines. Expert Syst Appl 74:105–114

    Article  Google Scholar 

  • Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 eighth IEEE international conference on data mining. pp 413–422. IEEE

  • Liu Y, Li X, Zhang Z (2020) A new approach in reject inference of using ensemble learning based on global semi-supervised framework. Futur Gener Comput Syst 109:382–391

    Article  Google Scholar 

  • Maldonado S, Paredes G (2010) A semi-supervised approach for reject inference in credit scoring using svms. In: Industrial conference on data mining. pp 558–571. Springer

  • Mancisidor RA, Kampffmeyer M, Aas K, Jenssen R (2020). Deep generative models for reject inference in credit scoring. Knowl-Based Syst, 105758

  • Marshall A, Tang L, Milne A (2010) Variable reduction, sample selection bias and bank retail credit scoring. J Empir Financ 17(3):501–512

    Article  Google Scholar 

  • Navas-Palencia G (2020) Optimal binning: mathematical programming formulation. http://arxiv.org/abs/2001.08025

  • Nemenyi P (1962) Distribution-free multiple comparisons. In: Biometrics, Vol. 18, international biometric Soc 1441 I ST, NW, SUITE 700, Washington, DC 20005-2210, p 263

  • Shen F, Zhao X, Kou G (2020) Three-stage reject inference learning framework for credit scoring using unsupervised transfer learning and three-way decision theory. Decis Supp Syst 137:113366

    Article  Google Scholar 

  • Siddiqi N (2017) Intelligent credit scoring: building and implementing better credit risk scorecards, 2nd edn. Wiley, Hoboken, NJ

    Book  Google Scholar 

  • Sohn S, Shin S (2006) Reject inference in credit operations based on survival analysis. Expert Syst Appl 31(1):26–29

    Article  Google Scholar 

  • Tian Y, Yong Z, Luo J (2018) A new approach for reject inference incredit scoring using kernel-free fuzzy quadratic surface support vector machines. Appl Soft Comput 73:96–105

    Article  Google Scholar 

  • Xia Y (2019) A novel reject inference model using outlier detection and gradient boosting technique in peer-to-peer lending. IEEE Access 7:92893–92907

    Article  Google Scholar 

  • Xia Y, Yang X, Zhang Y (2018) A rejection inference technique based on contrastive pessimistic likelihood estimation for P2P lending. Electron. Commerce Res. Appl. 30:111–124

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monir El Annas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

El Annas, M., Benyacoub, B. & Ouzineb, M. Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference. Comput Stat 38, 149–169 (2023). https://doi.org/10.1007/s00180-022-01220-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01220-9

Keywords