Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Li, Wei; Ding, Shuai; Wang, Hao; Chen, Yi; Yang, Shanlin

doi:10.1007/s11280-019-00676-y

Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Published: 19 March 2019

Volume 23, pages 23–45, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Wei Li^1,2,
Shuai Ding^1,2,
Hao Wang^1,2,
Yi Chen^1,2 &
…
Shanlin Yang^1,2

1267 Accesses
14 Citations
Explore all metrics

Abstract

In recent years, peer-to-peer (P2P) lending in China, which is a new form of unsecured financing that uses the Internet, has boomed, but the consequent credit risk problems are inevitable. A key challenge facing P2P lending platforms is accurately predicting the default probability of the borrower of each loan using the default prediction model, which effectively helps the P2P lending platform avoid credit risks. The traditional default prediction model based on machine learning and statistical learning does not meet the needs of P2P lending platforms in terms of default risk prediction because for data-driven P2P lending, credit data have a large number of missing values, are high-dimensional and have class-imbalanced problems, which makes it difficult to effectively train the default risk prediction model. To solve the above problems, this paper proposes a new default risk prediction model based on heterogeneous ensemble learning. Three individual classifiers, extreme gradient boosting (XGBoost), a deep neural network (DNN) and logistic regression (LR), are used simultaneously with a liner weight ensemble strategy. In particular, this model is able to process missing values. After generating discrete and rank features, this model adds missing values to the model for self-training. Then, the hyperparameters are optimized by the XGBoost model to improve the performance of the prediction model. Finally, compared with the benchmark model, the proposed method significantly improves the accuracy of the prediction results. In conclusion, the prediction method proposed in this paper solves the class-imbalanced problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Selection on Credit Risk Prediction for Peer-to-Peer Lending

Credit Risk Assessment of Peer-to-Peer Lending Borrower Utilizing BP Neural Network

Default risk prediction and feature extraction using a penalized deep neural network

Article 15 September 2022

References

Bergstra, J., Yoshua Bengio, U.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012). https://doi.org/10.1162/153244303322533223
Article MathSciNet MATH Google Scholar
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012). https://doi.org/10.1016/j.eswa.2011.09.033
Article Google Scholar
Chen, T., International, C.G.-P. of the 22nd acm sigkdd: U.: XGBoost: a scalable tree boosting system. Dl.Acm.Org. 785–794(2016), (2016). https://doi.org/10.1145/2939672.2939785
Chen, K., Jiang, J., Zheng, F., Chen, K.: A novel data-driven approach for residential electricity consumption prediction based on ensemble learning. Energy. 150, 49–60 (2018)
Article Google Scholar
Cheng, M.Y., Hoang, N.D., Limanto, L., Wu, Y.W.: A novel hybrid intelligent approach for contractor default status prediction. Knowledge-Based Syst. 71, 314–321 (2014). https://doi.org/10.1016/j.knosys.2014.08.009
Article Google Scholar
Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012). https://doi.org/10.1016/j.ijforecast.2011.07.006
Article Google Scholar
Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M.: Evaluating credit risk and loan performance in online peer-to-peer (P2P) lending. Appl. Econ. 47, 54–70 (2015). https://doi.org/10.1080/00036846.2014.962222
Article Google Scholar
Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. J. 65, 139–151 (2018). https://doi.org/10.1016/j.asoc.2018.01.021
Article Google Scholar
Genre, V., Kenny, G., Meyler, A., Timmermann, A.: Combining expert forecasts: can anything beat the simple average? Int. J. Forecast. 29, 108–121 (2013). https://doi.org/10.1016/j.ijforecast.2012.06.004
Article Google Scholar
Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249, 417–426 (2016). https://doi.org/10.1016/j.ejor.2015.05.050
Article MathSciNet MATH Google Scholar
Haixiang, G., Yijing, L., Yanan, L., Xiao, L., Jinling, L.: BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. Artif. Intell. 49, 176–193 (2016). https://doi.org/10.1016/j.engappai.2015.09.011
Article Google Scholar
Han, L., Han, L., Zhao, H.: Orthogonal support vector machine for credit scoring. Eng. Appl. Artif. Intell. 26, 848–862 (2013). https://doi.org/10.1016/j.engappai.2012.10.005
Article Google Scholar
Ignatov, A.: Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl. Soft Comput. J. 62, 915–922 (2018). https://doi.org/10.1016/j.asoc.2017.09.027
Article Google Scholar
Iwata, K.: Extending the peak bandwidth of parameters for softmax selection in reinforcement learning. IEEE Trans. Neural Networks Learn. Syst. 28, 1865–1877 (2017). https://doi.org/10.1109/TNNLS.2016.2558295
Article MathSciNet Google Scholar
Kaneko, H., Funatsu, K.: Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemom. Intell. Lab. Syst. 142, 64–69 (2015). https://doi.org/10.1016/j.chemolab.2015.01.001
Article Google Scholar
Kim, S.Y., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014). https://doi.org/10.1016/j.econmod.2013.10.005
Article Google Scholar
Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259, 689–702 (2017). https://doi.org/10.1016/j.ejor.2016.10.031
Article MATH Google Scholar
Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. J. 14, 554–562 (2014). https://doi.org/10.1016/j.asoc.2013.08.014
Article Google Scholar
Kuncheva, L.I., Faithfull, W.J.: PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans. Neural Networks Learn. Syst. 25, 69–80 (2014). https://doi.org/10.1109/TNNLS.2013.2248094
Article Google Scholar
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015). https://doi.org/10.1016/j.ejor.2015.05.030
Article MATH Google Scholar
Li, H., Mao, X., Wu, C., Yang, F.: Design and Analysis of a General Data Evaluation System Based on Social Networks. (2018)
Liu, J., Liao, X., Huang, W., Yang, J.b.: A new decision-making approach for multiple criteria sorting with an imbalanced set of assignment examples. Eur. J. Oper. Res. 265, 598–620 (2018). https://doi.org/10.1016/j.ejor.2017.07.043
Article MathSciNet MATH Google Scholar
Liu, X., Chuai, G., Gao, W., Zhang, K.: GA-AdaBoostSVM classifier empowered wireless network diagnosis. (2018)
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny). 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
Article Google Scholar
Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42, 4621–4631 (2015). https://doi.org/10.1016/j.eswa.2015.02.001
Article Google Scholar
Nascimento, D.S.C., Coelho, A.L.V., Canuto, A.M.P.: Integrating complementary techniques for promoting diversity in classifier ensembles: a systematic study. Neurocomputing. 138, 347–357 (2014). https://doi.org/10.1016/j.neucom.2014.01.027
Article Google Scholar
Osanaiye, O., Cai, H., Choo, K.K.R., Dehghantanha, A., Xu, Z., Dlodlo, M.: Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J. Wirel. Commun. Netw. 2016, (2016). https://doi.org/10.1186/s13638-016-0623-3
Paleologo, G., Elisseeff, A., Antonini, G.: Subagging for credit scoring models. Eur. J. Oper. Res. 201, 490–499 (2010). https://doi.org/10.1016/j.ejor.2009.03.008
Article Google Scholar
Serrano-Cinca, C., Gutiérrez-Nieto, B.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support. Syst. 89, 113–122 (2016). https://doi.org/10.1016/j.dss.2016.06.014
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). https://doi.org/10.1214/12-AOS1000
Article MathSciNet MATH Google Scholar
Sun, T., Jiao, L., Liu, F., Wang, S., Feng, J.: Selective multiple kernel learning for classification with ensemble strategy. Pattern Recogn. 46, 3081–3090 (2013). https://doi.org/10.1016/j.patcog.2013.04.003
Article Google Scholar
Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48, 1623–1637 (2015). https://doi.org/10.1016/j.patcog.2014.11.014
Article Google Scholar
Sun, J., Lang, J., Fujita, H., Li, H.: Imbalanced Enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci. (Ny). 425, 76–91 (2017). https://doi.org/10.1016/j.ins.2017.10.017
Article MathSciNet Google Scholar
Tavana, M., Abtahi, A.R., Di Caprio, D., Poortarigh, M.: An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing. 275, 2525–2554 (2018). https://doi.org/10.1016/j.neucom.2017.11.034
Article Google Scholar
Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M., Martens, D.: Bankruptcy prediction for SMEs using relational data. Decis. Support. Syst. 102, 69–81 (2017). https://doi.org/10.1016/j.dss.2017.07.004
Article Google Scholar
Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Syst. 26, 61–68 (2012). https://doi.org/10.1016/j.knosys.2011.06.020
Article Google Scholar
Wang, Z., Jiang, C., Ding, Y., Lyu, X., Liu, Y.: A novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electron. Commer. Res. Appl. 27, 74–82 (2018). https://doi.org/10.1016/j.elerap.2017.12.006
Article Google Scholar
Wu, H., Zhang, Z., Yue, K., Zhang, B., He, J., Sun, L.: Dual-regularized matrix factorization with deep neural networks for recommender systems. Knowledge-Based Syst. 145, 46–58 (2018). https://doi.org/10.1016/j.knosys.2018.01.003
Article Google Scholar
Xia, Y., Liu, C., Liu, N.: Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron. Commer. Res. Appl. 24, 30–49 (2017). https://doi.org/10.1016/j.elerap.2017.06.004
Article Google Scholar
Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017). https://doi.org/10.1016/j.eswa.2017.02.017
Article Google Scholar
Xia, Y., Liu, C., Da, B., Xie, F.: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 93, 182–199 (2018). https://doi.org/10.1016/j.eswa.2017.10.022
Article Google Scholar
Xiao, H., Xiao, Z., Wang, Y.: Ensemble classification based on supervised clustering for credit scoring. Appl. Soft Comput. J. 43, 73–86 (2016). https://doi.org/10.1016/j.asoc.2016.02.022
Article Google Scholar
Yao, C., Cai, D., Bu, J., Chen, G.: Pre-training the deep generative models with adaptive hyperparameter optimization. Neurocomputing. 247, 144–155 (2017). https://doi.org/10.1016/j.neucom.2017.03.058
Article Google Scholar
Yeh, C.C., Lin, F., Hsu, C.Y.: A hybrid KMV model, random forests and rough set theory approach for credit rating. Knowledge-Based Syst. 33, 166–172 (2012). https://doi.org/10.1016/j.knosys.2012.04.004
Article Google Scholar

Download references

Funding

This work was funded by the National Natural Science Foundation of China under Grant Nos. 91846107, 71571058 and Anhui Provincial Science and Technology Major Project under Grant Nos. 16030801121 and 17030801001.

Author information

Authors and Affiliations

School of Management, Hefei University of Technology, Hefei, 23009, Anhui, China
Wei Li, Shuai Ding, Hao Wang, Yi Chen & Shanlin Yang
Key Laboratory of Process Optimization and Intelligent Decision-Making (Ministry of Education), Hefei University of Technology, Hefei, 23009, Anhui, China
Wei Li, Shuai Ding, Hao Wang, Yi Chen & Shanlin Yang

Authors

Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Ding
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shanlin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuai Ding or Shanlin Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, W., Ding, S., Wang, H. et al. Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China. World Wide Web 23, 23–45 (2020). https://doi.org/10.1007/s11280-019-00676-y

Download citation

Received: 03 July 2018
Revised: 19 December 2018
Accepted: 12 March 2019
Published: 19 March 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11280-019-00676-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Abstract

Access this article

Similar content being viewed by others

Feature Selection on Credit Risk Prediction for Peer-to-Peer Lending

Credit Risk Assessment of Peer-to-Peer Lending Borrower Utilizing BP Neural Network

Default risk prediction and feature extraction using a penalized deep neural network

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China

Abstract

Access this article

Similar content being viewed by others

Feature Selection on Credit Risk Prediction for Peer-to-Peer Lending

Credit Risk Assessment of Peer-to-Peer Lending Borrower Utilizing BP Neural Network

Default risk prediction and feature extraction using a penalized deep neural network

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation