Abstract
The majority of current credit-scoring models, used for loan approval processing, are generally built on the basis of the information from the accepted credit applicants whose ability to repay the loan is known. This situation generates what is called the selection bias, presented by a sample that is not representative of the population of applicants, since rejected applications are excluded. Thus, the impact on the eligibility of those models from a statistical and economic point of view. Especially for the models used in the peer-to-peer lending platforms, since their rejection rate is extremely high. The method of inferring rejected applicants information in the process of construction of the credit scoring models is known as reject inference. This study proposes a semi-supervised learning framework based on hidden Markov models (SSHMM), as a novel method of reject inference. Real data from the Lending Club platform, the most used online lending marketplace in the United States as well as the rest of the world, is used to experiment the effectiveness of our method over existing approaches. The results of this study clearly illustrate the proposed method’s superiority, stability, and adaptability.





Similar content being viewed by others
References
Anderson R (2007) The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press, Oxford
Anderson B (2019) Using Bayesian networks to perform reject inference. Expert Syst Appl 137:349–356
Banasik J, Crook J (2007) Reject inference, augmentation, and sample selection. Eur J Oper Res 183(3):1582–1594
Banasik J, Crook J (2010) Reject inference in survival analysis by augmentation. J Oper Res Soc 61(3):473–485
Banasik J, Crook J, Thomas LC (2003) Sample selection bias in credit scoring models. JORS 54(8):822–832
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–71
Bücker M, van Kampen M, Krämer W (2013) Reject inference in consumer credit scoring with nonignorable missing data. J Bank Finance 37(3):1040–1045
Chen GG, Astebro T (2001) The economic value of reject inference in credit scoring. Department of Management Science, University of Waterloo, Waterloo
Crook J, Banasik J (2004) Does reject inference really improve the performance of application scoring models? J. Bank Finance 28(4):857–874
Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30
El annas M, Ouzineb M, Benyacoub B (2022) Hidden Markov models training using hybrid Baum Welch: variable neighborhood search algorithm. Stat Optim Inf Comput 10(1):160–170
Feelders AJ (1999) Credit scoring and reject inference with mixture models. Intell Syst AccountFinance Manag 8:271–279
Friedman M (1940) A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann Math Stat 11(1):86–92
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
https://home.kpmg/xx/en/home/insights/2020/02/pulse-of-fintech-archive.html
Kang Y, Jia N, Cui R, Deng J (2021) A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring. Appl Soft Comput 105:107259
Kim A, Cho S-B (2019) An ensemble semi-supervised learning method for predicting defaults in social lending. Eng Appl Artif Intell 81:193–199
Kozodoi N, Katsas P, Lessmann S, Moreira-Matias L, Papakonstantinou K (2019). Shallow self-learning for reject inference in credit scoring. In: Joint European conference on machine learning and knowledge discovery in databases, pp 516–532. Springer
Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247:124–136
Levinson SE, Rabiner LR, Sondhi MM (1983) An introduction to the application of the theory of probabilistic functions of Markov process to automatic speech recognition. The Bell Syst Tech J 62:1035–74
Li X, Parizeau M, Plamondon R (2000) Training hidden Markov models with multiple observations-a combinatorial method. IEEE Trans Pattern Anal Mach Intell 22:371–77
Li Z, Tian Y, Li K, Zhou F, Yang W (2017) Reject inference in credit scoring using Semi-supervised support vector machines. Expert Syst Appl 74:105–114
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 eighth IEEE international conference on data mining. pp 413–422. IEEE
Liu Y, Li X, Zhang Z (2020) A new approach in reject inference of using ensemble learning based on global semi-supervised framework. Futur Gener Comput Syst 109:382–391
Maldonado S, Paredes G (2010) A semi-supervised approach for reject inference in credit scoring using svms. In: Industrial conference on data mining. pp 558–571. Springer
Mancisidor RA, Kampffmeyer M, Aas K, Jenssen R (2020). Deep generative models for reject inference in credit scoring. Knowl-Based Syst, 105758
Marshall A, Tang L, Milne A (2010) Variable reduction, sample selection bias and bank retail credit scoring. J Empir Financ 17(3):501–512
Navas-Palencia G (2020) Optimal binning: mathematical programming formulation. http://arxiv.org/abs/2001.08025
Nemenyi P (1962) Distribution-free multiple comparisons. In: Biometrics, Vol. 18, international biometric Soc 1441 I ST, NW, SUITE 700, Washington, DC 20005-2210, p 263
Shen F, Zhao X, Kou G (2020) Three-stage reject inference learning framework for credit scoring using unsupervised transfer learning and three-way decision theory. Decis Supp Syst 137:113366
Siddiqi N (2017) Intelligent credit scoring: building and implementing better credit risk scorecards, 2nd edn. Wiley, Hoboken, NJ
Sohn S, Shin S (2006) Reject inference in credit operations based on survival analysis. Expert Syst Appl 31(1):26–29
Tian Y, Yong Z, Luo J (2018) A new approach for reject inference incredit scoring using kernel-free fuzzy quadratic surface support vector machines. Appl Soft Comput 73:96–105
Xia Y (2019) A novel reject inference model using outlier detection and gradient boosting technique in peer-to-peer lending. IEEE Access 7:92893–92907
Xia Y, Yang X, Zhang Y (2018) A rejection inference technique based on contrastive pessimistic likelihood estimation for P2P lending. Electron. Commerce Res. Appl. 30:111–124
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
El Annas, M., Benyacoub, B. & Ouzineb, M. Semi-supervised adapted HMMs for P2P credit scoring systems with reject inference. Comput Stat 38, 149–169 (2023). https://doi.org/10.1007/s00180-022-01220-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01220-9