Skip to main content
Log in

Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of default risk of microcredit

  • S.I. : SPIoT 2020
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Some scholars have shown that the machine learning methods based on a single-source data can successfully monitor the risks of formal financial activities, but not those of informal financial activities. This is because the data generated by formal financial activities, whether it is the structured or unstructured data, are of high quality and quantity, while the data generated by informal financial activities are not. Therefore, multi-source data are the key to monitor the risks of informal financial activities through machine learning. Although a few studies attempted to use multi-source data for financial risk prediction, they simply stack the obtained multi-source data, but ignore the original sources, heterogeneity, mutual redundancy and other characteristics of the data, so that the improvement of the prediction effect is not obvious. Therefore, TSAIB_RS method based on the two-stage adaptive integration of multi-source heterogeneous data was constructed in the paper, in which the data with different sources and different distributions were adaptively integrated. In order to test the reliability of TSAIB_RS method, the paper takes the default risk of microcredit in China as the test target and compares the prediction results of various test methods. It concludes that TSAIB_RS method can significantly improve the prediction effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Rajan RG (1992) Insiders and Outsiders. The choice between informed and Arm’s-length debt. J Finance 47(4):1367–1400

    Article  Google Scholar 

  2. Boot AWA, Thakor AV (1994) Moral Hazard and secured lending in an infinitely repeated credit market game. Int Econ Rev 35(4):899–920

    Article  Google Scholar 

  3. Tsai CF, Hsu Y-F, Yen DC (2014) A comparative study of classifier ensembles for bankruptcy prediction. Appl Soft Comput 24:977–984

    Article  Google Scholar 

  4. Liu X, Xu Z, Yu R (2012) Spatiotemporal variability of drought and the potential climatological driving factors in the Liao River. Hydrol Process 26(1):1–14

    Article  Google Scholar 

  5. West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57(47):66

    Google Scholar 

  6. Nazari M, Alidadi M (2013) Measuring credit risk of bank customers using artificial neural network. J Manag Res 5(5):17

    Article  Google Scholar 

  7. Ghatasheh N (2014) Business analytics using random forest trees for credit risk prediction: a comparison study. Int J Adv Sci Technol 72:19–30

    Article  Google Scholar 

  8. Fanning KM, Cogger KO (1998) Neural network detection of management fraud using published financial data. Int J Intell Syst Account Finance Manag 7(1):21–41

    Article  Google Scholar 

  9. Bhattacharyya S, Jha S, Tharakunnel K (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613

    Article  Google Scholar 

  10. Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923

    Article  Google Scholar 

  11. Huang Anzhong (2018) A risk detection system of e-commerce: researches based on soft information extracted by affective computing web texts. Electronic Commerce Res 18:143–157

    Article  Google Scholar 

  12. Guo Y, Zhou W, Luo C, Liu C, Xiong H (2016) Instance-based credit risk assessment for investment decisions in P2P lending. Eur J Oper Res 249(2):417–426

    Article  MathSciNet  Google Scholar 

  13. Serrano-Cinca C, Gutiérrez-Nieto B (2016) The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis Support Syst 89:113–122

    Article  Google Scholar 

  14. Estrada F (2011) Theory of financial risk. University Library of Munich, Munich

    Google Scholar 

  15. Chen D, Han C (2012) A comparative study of online P2P lending in the USA and China. J Internet Bank Commerce 17(2):1–15

    Google Scholar 

  16. Chen N, Ribeiro B, Chen A (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45(1):1–23

    Article  Google Scholar 

  17. Ge R, Feng J, Gu B, Zhang P (2017) Predicting and deterring default with social media information in peer-to-peer lending. J Manag Inf Syst 34(2):401–424

    Article  Google Scholar 

  18. Ma L, Zhao X, Zhou Z, Liu Y (2018) A new aspect on P2P online lending default prediction using meta-level phone usage data in China. Decis Support Syst 111:60–71

    Article  Google Scholar 

  19. Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Statist Soc Ser B (Statist Methodol) 70(1):53–71

    Article  MathSciNet  Google Scholar 

  20. Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Statist 22(2):231–245

    Article  MathSciNet  Google Scholar 

  21. Yang J-B, Xu D-L (2013) Evidential reasoning rule for evidence combination. Artif Intell 205:1–29

    Article  MathSciNet  Google Scholar 

  22. Zhou L, Tam KP, Fujita H (2016) Predicting the listing status of Chinese listed companies with multi-class classification models. Inf Sci 328:222–236

    Article  Google Scholar 

  23. Loughran T, Mc Donald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10 Ks. J Finance 66(1):35–65

    Article  Google Scholar 

  24. Simian D, Stoica F, Bărbulescu A (2020) Automatic optimized support vector regression for financial data prediction. Neural Comput Appl 32:2383–2396

    Article  Google Scholar 

  25. Xu Z, Cheng C, Sugumaran V (2020) Big data analytics of crime prevention and control based on image processing upon cloud computing. J Surveill Secur Saf 1:16–33

    Google Scholar 

  26. du Jardin P (2016) A two-stage classification technique for bankruptcy prediction. Eur J Oper Res 254(1):236–252

    Article  Google Scholar 

Download references

Acknowledgements

The paper is one of mid-term results of the humanities and social science planning project funded by the ministry of education of PRC, named “Researches of the Formation Mechanism of Low Efficiency of Poverty Alleviation of Microcredit and Innovation Practice Model in Jiangsu Province” (20YJA790028), a major project of philosophy and social science research in Jiangsu universities, “Researches on the Optimization of Fintech Innovation Supervision Path (2019SJZDA060)” as well as Anhui Province Social Science Association Project (2019CX079).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, A., Wu, F. Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of default risk of microcredit. Neural Comput & Applic 33, 4065–4075 (2021). https://doi.org/10.1007/s00521-020-05489-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05489-z

Keywords

Navigation