An Automatic Credit Scoring Strategy (ACSS) using memetic evolutionary algorithm and neural architecture search

https://doi.org/10.1016/j.asoc.2021.107871Get rights and content

Highlights

  • We utilize an improved SMOTE that could decrease the impact of data imbalance that exists in the credit data.

  • We propose to leverage the credit feature pruning algorithm and memetic optimization algorithm.

  • We propose an automated credit scoring strategy (ACSS) method with a cost-effective neural architecture search (C-NAS) scheme.

Abstract

Credit scoring is playing an increasingly critical role with the rising number of lending operations for micro and small enterprises as well as individuals. A large number of research is primarily based on the combination and construction methods of credit scoring models by the experts. However, different credit data have distinct requirements for the models, and how to automatically search and construct credit scoring models according to credit data has become essential, that is the main concern of this paper. In response to the current challenges for credit scoring research, we proposed an Automatic Credit Scoring Strategy (ACSS), designed a credit assessment platform which includes data import, classification model automatic search, feature selection, hyperparameter optimization, data mining, classification output and other modules. Aiming at the problem of substantial imbalance in credit data, we propose an improved SMOTE algorithm that is capable of generating supplementary data for the lack of minority in credit data, thereby making the credit data distribution well balanced. As for classification model selection, features engineering, parameter optimization and other parts, we further incorporate automatic search ways to reduce manual interaction. We utilize public and self-owned credit data sets to conduct experiments and compare them with the latest credit assessment methods. Extensive experiments have demonstrated that our ACSS method achieves relatively noticeable performance improvements using the German credit dataset, the Taiwan credit dataset and the personal credit dataset for credit scoring. The best results achieved by our proposed method are 0.98, 0.99, 0.895 and 0.901 for MAE, RMSE, accuracy and precision respectively. In addition, the experimental results also show that our proposed improved SMOTE algorithm contributes to the credit scoring performance enhancement. The experimental findings suggest that our proposed ACSS balances automation and accuracy, which can be implemented to the consumption industry to enhance the reliability of credit assessments.

Introduction

Nowadays, credit assessment has been widely seen in finance, consumption, insurance, education and other industries, offering extremely significant industrial applications for medical purposes, loans, education, employment and other tasks. The sudden outbreak of the COVID-19 epidemic in early 2020 proliferated across the globe, making it the most serious global crisis to confront humanity since World War II. The epidemic brought unprecedented shocks to countries around the world and significantly increased economic instability, leading to a noticeable credit crisis for a great number of companies, individuals and financial institutions, which in turn further worsened the economic environment worldwide. Therefore, the efficient and stable credit scoring model construction is essential for research to alleviate the financial and credit crisis caused by the epidemic and gradually boost the global economic recovery.

The assessment of the credit risk is typically based on credit scoring models that are extensively seen in assessing the applicant’s default probability. The key issue in credit risk evaluation is how applicants can be classified into two main groups: default and non-default. The assessor may then determine to refuse or accept the loan application after credit assessment procedure. Due to its role in management of credit risk, credit scoring has drawn tremendous attention in financial industries. There are thus a series of artificial intelligence and machine training models often used monitor credit scoring to check its efficiency in classification through a slight improvements in the credit scoring model. Credit data analysis and mining can address the issues of marketing, pricing, fraud, and credit induced by information asymmetry with the widespread implementation of data mining techniques. The most fundamental advancement of credit evaluation methods in the sense of big data, compared with conventional credit evaluation methods, lies in the use of a vast volume of non-financial data for analysis in order to take full account of the evaluator’s credit status. There are also many work using data mining methods. A credit evaluation algorithm based on the Bayes formula, which takes into account data from alternative sources and the option of the bank’s cross-product provided to the client, was proposed by Sergei [1]. Li et al. [2] developed a predictive model for the automated evaluation by machine learning technique of healthy elderly service credit efficiency. To obtain credit ratings-related attributes, the credit data is computed and analyzed. Data mining techniques are also used in corporate credit evaluation analysis, in addition to the individual credit assessment. He et al. [3] proposed a new heterogeneous ensemble credit model that incorporates the stacking algorithm. A wide variety of models, which include individual classifiers, homogeneous and heterogeneous ensembling models, are adopted as benchmarks in order to validate the efficiency of proposed stacking strategy. Wei Wang et al. [4] investigated a blockchain technology-based distributed credit assessment system which has intelligent protocols and decentralized features to provide credit histories through unchanged timestamps and distributed ledgers, and then strengthen the defects in existing centralized credit evaluation systems. Current research on credit scoring revolves around ensemble methods and single classifiers, with different researchers setting up various classifier integration methods based on the characteristics of credit data. However, this approach requires too much manual intervention, and it is tough to filter out suitable credit scoring ensemble models for those without any certain modeling experiences. Therefore, the existing study on credit scoring raises the question of whether it is possible to propose a credit scoring model generation method that reduces human intervention, is highly automated, and at the same time has a decent performance.

There are several challenges in credit scoring. The first challenge is the imbalance of credit data. Credit data is of diverse types and from a wide range of sources. Existing credit scoring methods pay less attention to the impact of credit data imbalance on model performance. Therefore, it is also a pressing issue to pre-process credit data so that the distribution of different credit categories in the data is as balanced as possible, thus improving the accuracy of credit scoring models. The second challenge is that the diversity of credit data types leads to credit models that are not universally applicable and scalable. The existing credit scoring models are developed by training the credit data, selecting the appropriate credit scoring model based on the experience of the experts, and combining a combination of human-set hyperparameters and other optimization approaches to achieve a satisfying credit scoring model. However, as credit data varies greatly between industries and regions, it is extremely difficult for existing credit scoring models to achieve comparable credit evaluation results under different credit data. The third challenge is that the credit scoring model building process requires a great deal of human involvement, which is not conducive to the widespread adoption of credit scoring models. As a typical machine learning classification problem, credit scoring modeling involves a great deal of manual intervention in model construction, hyperparameter selection, model training and other steps. However, in the face of the global economic downturn caused by the COVID-19 pandemic, it is imperative for banks, micro and small businesses, audit firms and other financial institutions that are in need of credit scoring services to obtain the most efficient and automated credit scoring model construction method feasible. Fortunately, the advancement of neural architecture search(NAS) shed light on this issue.

While many existing NAS methods are able to learn network architectures, most of them have been designed for problems with the classification of images that generally have high-quality labels. Since the recent methods concerning credit scoring require a considerable work on design of ensemble models, including the research of same-sex ensemble models and opposite-sex ensemble models, such approaches improve the credit scoring accuracy by assembling the ratio of base classifiers. However, since applying NAS methods directly to credit data would consume significantly more time, we propose an economical NAS method for credit scoring, which prunes the candidate model by calculating the importance, thus improving the search effect of NAS and reducing unnecessary search consumption time.

Credit scoring is typically a classification model construction issue. In general, machine learning datasets are typically multi-dimensional. Even so, irrelevant and redundant features not only affect classification model’s prediction efficiency but may also raise computational complexity. Therefore, feature extraction and selection methods are regarded as promising methods in machine study, and key features are found to minimize computation time costs and also to enhance predictive efficiency for the classification models. A variety of experiments have been carried out and the results have shown the effectiveness of our design in quantitatively evaluating the performance of the C-NAS framework. We also used the optimal model and extract features that best reflect personal credits using the public and personal credit datasets, and compare results to the state-of-the-art scoring models. The main contributions are shown as follows.

(1) We propose an improved SMOTE that could decrease the impact of data imbalance that exists in the credit data and uses Improved SMOTE to augment the data for some small samples.

(2) We propose to leverage the credit feature pruning algorithm and memetic optimization algorithm which are capable of reducing more irrelevant features by the calculated credit feature importance and shortening the model search time.

(3) Additionally, we propose an automated credit scoring strategy (ACSS) method with an automatic cost-effective neural architecture search (C-NAS) method to improve the accuracy of credit classification and reduce the unnecessary human efforts to design the model and adjust the hyperparameters.

The subsequent sections of the paper are structured as follows. In Section 2, we will introduce the current research on credit scoring methods and the evolutionary algorithms that have been adopted. In Section 3, we will introduce our proposed ACSS method for automatic credit scoring. In Section 4 , we will introduce experimental introduction. In Section 5, we will present the results of the experiments separately from the three research questions as well as discuss and analyze the results. In Section 6, we will further analyze the advantages of our ACSS approach in terms of balancing automation and accuracy. In Section 7, we will summarize the primary research contributions and drawbacks of this paper, and provide an insight into our future research.

Section snippets

Related work

In order to effectively understand the details of credit scoring modeling research, we have described and summarized the relevant background.

Automatic credit scoring method

The automatic credit scoring modeling approach shown in this paper incorporates the key factors of credit data mining, in particular the fully automated machine learning pipeline for credit classification, which consists of four critical components: (1) Extraction of credit data feature; (2) Selection of features for credit assessment; (3) Searching for classification modeling; (4) Optimization of the Hyperparameters

These phases are completely automatic, with both the input being credit data

System overview

Because experiments of searching the credit scoring models is relatively time-consuming, we employ the bootstrapping methods to analyze variability over numerous repeats of each experiment. We run each AutoML framework 30 times per credit data set, and after that select 5 of the 30 results at random and choose the best of these 5 results as the final result. This is repeated 200 times for each AutoML platform and credit data set, and statistics are computed over all these result distributions.

Results and discussion

The aim of this study was to validate the efficiency and superiority of our proposed memetic C-NAS based ACSS approach for unbalanced credit data sets. Hence, the main research questions are outlined in this section and the experiments designed to solve them are explained.

RQ1: Does the proposed ASCC using memetic cost-effective NAS prove better than the other widely used techniques in the credit datasets?

RQ2: Does our approach solve the problem of data imbalance? How effective are the

Is it possible to balance automation and high performance? a further analysis of the results for ACSS

With the advantage of credit scoring model automatization, our proposed ACSS presents for the first time a technique for determining credit scoring models employing neural architecture search techniques without the necessity of extensive human intervention for model design and hyperparameter selection. Accordingly, we therefore need to compare our approach with the currently available mainstream credit scoring methods to verify whether our ACSS approach can balance automation and high

Conclusion and future work

Credit appraisal has been extended to all fields of contemporary life, influencing and transforming the lives of all consumers. Current study on credit scores primarily focuses on the development of integrated classification models that cannot incorporate the entire data classification process. This paper therefore proposes a credit score model based on automated machine learning approaches that can efficiently combine data collection, feature discovery, model search, model detection and other

CRediT authorship contribution statement

Fan Yang: Conceptualization, Methodology, Software, Writing – original draft. Yanan Qiao: Supervision. Cheng Huang: Methodology, Validation. Shan Wang: Data curation. Xiao Wang: Writing – reviewing and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by the National key R&D Program of China under Grant No. 2018YFB1402700. This work is also supported by the Fundamental Research Funds for the Central Universities, Chinaunder Grant No. xzy022020056. This work is also supported by a scholarship from the China Scholarship Council (CSC) under Grant No. 201906280499 while the first author studying at Leiden University. This work is also supported by the the Blockchain Core Technology Strategic Research Program, China

References (55)

  • SouiM. et al.

    Rule-based credit risk assessment model using multi-objective evolutionary algorithms

    Expert Syst. Appl.

    (2019)
  • FuX. et al.

    Topology optimization against cascading failures on wireless sensor networks using a memetic algorithm

    Comput. Netw.

    (2020)
  • GongG. et al.

    An effective memetic algorithm for multi-objective job-shop scheduling

    Knowl.-Based Syst.

    (2019)
  • AlsmadiM.K.

    An efficient similarity measure for content based image retrieval using memetic algorithm

    Egypt. J. Basic Appl. Sci.

    (2017)
  • IaccaG. et al.

    Ockham’s razor in memetic computing: three stage optimal memetic exploration

    Inform. Sci.

    (2012)
  • LessmannS. et al.

    Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research

    European J. Oper. Res.

    (2015)
  • XiaY. et al.

    A novel heterogeneous ensemble credit scoring model based on bstacking approach

    Expert Syst. Appl.

    (2018)
  • WangL. et al.

    Imbalanced credit risk evaluation based on multiple sampling, multiple kernel fuzzy self-organizing map and local accuracy ensemble

    Appl. Soft Comput.

    (2020)
  • ShenF. et al.

    A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique

    Appl. Soft Comput.

    (2021)
  • WuC.-F. et al.

    A predictive intelligence system of credit scoring based on deep multiple kernel learning

    Appl. Soft Comput.

    (2021)
  • ZakharovS. et al.

    Development of analytical CRM — the system of assesment of solvency of borrowers of commercial bank

  • C. Li, Y. Zhao, S. Li, P. Wang, Z. Zhao, Design and implementation of credit evaluation system for healthy aged...
  • W. Wang, A SME Credit Evaluation System Based on Blockchain, in: 2020 International Conference on E-Commerce and...
  • van RijnJ.N. et al.

    The online performance estimation framework: heterogeneous ensemble learning for data streams

    Mach. Learn.

    (2018)
  • GuoD. et al.

    Heterogeneous ensemble-based infill criterion for evolutionary multiobjective optimization of expensive problems

    IEEE Trans. Cybern.

    (2018)
  • LiW. et al.

    Heterogeneous ensemble for default prediction of peer-to-peer lending in China

    IEEE Access

    (2018)
  • SherstjukV. et al.

    Forest fire fighting using heterogeneous ensemble of unmanned aerial vehicles

  • Cited by (13)

    • Bagging Supervised Autoencoder Classifier for credit scoring

      2023, Expert Systems with Applications
      Citation Excerpt :

      Some attempts have been made to produce and extract useful information from the raw data, i.e., hand-craft feature engineering for credit scoring. However, these approaches require a prior knowledge of credit scoring based on extensive and comprehensive experience in the financial sector, which may be expensive to acquire (Yang et al., 2021). Besides, hand-craft feature learning is time and labor-intensive.

    • Credit scoring methods: Latest trends and points to consider

      2022, Journal of Finance and Data Science
      Citation Excerpt :

      As of June, 2022 (article acceptance date), we would highlight a research paper published by Saudi Central Bank15 that covers the changes in the credit scores of borrowers in Saudi Arabia. Some other recent publications on credit scoring modelling16,17 mention the changes caused by the pandemic, but the datasets employed by the authors might not yet include the COVID-19 period. We expect that more research papers on the topic will be published as soon as enough data is accumulated.

    View all citing articles on Scopus
    View full text