Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending

Jiang, Cuiqing; Wang, Zhao; Wang, Ruiya; Ding, Yong

doi:10.1007/s10479-017-2668-z

Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending

Analytical Models for Financial Modeling and Risk Management
Published: 04 October 2017

Volume 266, pages 511–529, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Cuiqing Jiang¹,
Zhao Wang¹,
Ruiya Wang ORCID: orcid.org/0000-0002-4037-0271¹ &
…
Yong Ding¹

4566 Accesses
101 Citations
Explore all metrics

Abstract

Predicting whether a borrower will default on a loan is of significant concern to platforms and investors in online peer-to-peer (P2P) lending. Because the data types online platforms use are complex and involve unstructured information such as text, which is difficult to quantify and analyze, loan default prediction faces new challenges in P2P. To this end, we propose a default prediction method for P2P lending combined with soft information related to textual description. We introduce a topic model to extract valuable features from the descriptive text concerning loans and construct four default prediction models to demonstrate the performance of these features for default prediction. Moreover, a two-stage method is designed to select an effective feature set containing both soft and hard information. An empirical analysis using real-word data from a major P2P lending platform in China shows that the proposed method can improve loan default prediction performance compared with existing methods based only on hard information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform

Article 17 February 2018

Interpretable Machine Learning Based on Integration of NLP and Psychology in Peer-to-Peer Lending Risk Evaluation

Default risk prediction and feature extraction using a penalized deep neural network

Article 15 September 2022

References

Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intelligent Systems in Accounting Finance & Management, 18(2–3), 59–88.
Article Google Scholar
Angilella, S., & Mazzù, S. (2015). The financing of innovative SMEs: A multicriteria credit rating model. European Journal of Operational Research, 244(2), 540–554.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. JMLR.org.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Cornée, S. (2017). The relevance of soft information for predicting small business credit default: Evidence from a social bank. Journal of Small Business Management. doi:10.1111/jsbm.12318.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Google Scholar
Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465.
Article Google Scholar
Dorfleitner, G., Priberny, C., Schuster, S., Stoiber, J., Weber, M., Castro, I. D., et al. (2016). Description-text related soft information in peer-to-peer lending—Evidence from two leading european platforms. Journal of Banking & Finance, 64, 169–187.
Article Google Scholar
Emekter, R., Tu, Y., Jirasakuldech, B., & Lu, M. (2015). Evaluating credit risk and loan performance in online peer-to-peer (p2p) lending. Applied Economics, 47(1), 54–70.
Article Google Scholar
Finlay, S. (2011). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210(2), 368–378.
Article Google Scholar
Friedman, N., Dan, G., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2–3), 131–163.
Article Google Scholar
Gao, Q., & Lin, M. (July 15, 2016). Economic value of texts: Evidence from online debt crowdfunding. Available at SSRN: doi:10.2139/ssrn.2446114.
Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2015). Instance-based credit risk assessment for investment decisions in p2p lending. European Journal of Operational Research, 249(2), 417–426.
Article Google Scholar
Hajek, P., & Michalak, K. (2013). Feature selection in corporate credit rating prediction. Knowledge-Based Systems, 51(1), 72–84.
Article Google Scholar
Harris, T. (2013). Quantitative credit risk assessment using support vector machines: Broad versus narrow default definitions. Expert Systems with Applications, 40(11), 4404–4413.
Article Google Scholar
Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33(4), 847–856.
Article Google Scholar
Iyer, R., Khwaja, A. I., Luttmer, E. F., & Shue, K. (2015). Screening peers softly: Inferring the quality of small borrowers. Management Science, 62(6), 1554–1577.
Article Google Scholar
Hájek, P. (2011). Municipal credit rating modelling by neural networks. Decision Support Systems, 51(1), 108–118.
Article Google Scholar
Kruppa, J., Schwarz, A., Arminger, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates using machine learning. Expert Systems with Applications, 40(13), 5125–5131.
Article Google Scholar
Kruppa, J., Ziegler, A., & König, I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131(10), 1639–1654.
Article Google Scholar
Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1–2), 161–205.
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
Article Google Scholar
Liberti, J. M., & Petersen, M. A. (2017). Information: Hard and Soft. Working Paper.
Lin, M., Prabhala, N. R., & Viswanathan, S. (2013). Judging borrowers by the company they keep: Friendship networks and information asymmetry in online peer-to-peer lending. Management Science, 59(1), 17–35.
Article Google Scholar
Malekipirbazari, M., & Aksakalli, V. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621–4631.
Article Google Scholar
Michels, J. (2012). Do unverifiable disclosures matter? Evidence from peer-to-peer lending. The Accounting Review, 87(4), 1385–1413.
Article Google Scholar
Paul, S. (2014). Creditworthiness of a borrower and the selection process in micro-finance: A case study from the urban slums of India. Margin: The Journal of Applied Economic Research, 8(1), 59–75.
Article Google Scholar
Pope, D. G., & Sydnor, J. R. (2011). What’s in a picture? Evidence of discrimination from prosper.com. Journal of Human Resources, 46(1), 53–92.
Article Google Scholar
Puro, L., Teich, J. E., Wallenius, H., & Wallenius, J. (2010). Borrower decision aid for people-to-people lending. Decision Support Systems, 49(1), 52–60.
Article Google Scholar
Shao, H., Ju, X., Wu, C., Xu, J., & Liu, M. (2012). Research on commercial bank credit risk evaluation model based on the integration of the probability distribution theory and the bp neural network technology. International Journal of Advancements in Computing Technology, 4(22), 115–128.
Article Google Scholar
Thomas, L. C. (2010). Consumer finance: Challenges for operational research. Journal of the Operational Research Society, 61(1), 41–52.
Article Google Scholar
Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61–68.
Article Google Scholar
Wang, S., Qi, Y., Fu, B., & Liu, H. (2016). Credit risk evaluation based on text analysis. International Journal of Cognitive Informatics & Natural Intelligence, 10(1), 1–11.
Article Google Scholar
Wei, X., & Croft, W. B. (2006). LDA-based document models for ad-hoc retrieval. In International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 178–185). ACM.
Yao, X., Crook, J., & Andreeva, G. (2015). Support vector regression for loss given default modelling. European Journal of Operational Research, 240(2), 528–538.
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the assistance provided by the constructive comments of the anonymous referees, which considerably improved the paper in terms of quality and clarity. This work was funded primarily by the National Natural Science Foundation of China (Grant Nos. 71571059,71331002 and 71731005), and the Humanities and Social Sciences Fund Projects of the Ministry of Education (Grant Nos. 13YJA630037, 15YJA630010).

Author information

Authors and Affiliations

School of Management, Hefei University of Technology, Hefei, Anhui, China
Cuiqing Jiang, Zhao Wang, Ruiya Wang & Yong Ding

Authors

Cuiqing Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruiya Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruiya Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, C., Wang, Z., Wang, R. et al. Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Ann Oper Res 266, 511–529 (2018). https://doi.org/10.1007/s10479-017-2668-z

Download citation

Published: 04 October 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10479-017-2668-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending

Abstract

Access this article

Similar content being viewed by others

The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform

Interpretable Machine Learning Based on Integration of NLP and Psychology in Peer-to-Peer Lending Risk Evaluation

Default risk prediction and feature extraction using a penalized deep neural network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending

Abstract

Access this article

Similar content being viewed by others

The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform

Interpretable Machine Learning Based on Integration of NLP and Psychology in Peer-to-Peer Lending Risk Evaluation

Default risk prediction and feature extraction using a penalized deep neural network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation