Keywords

1 Introduction

The Latin American economy experienced the currency crisis and its associated confusion from the early 1990s through the early 2000s, so the GDP growth rate in 1990–2002 was sluggish [1]. However, vigorous capital investment is expanded because of the global economic expansion and the rise in primary commodity prices, expansion of exports and inflow of investment funds since the 2000s. In addition, it has achieved rapid economic growth due to the expansion of personal consumption since 2003. Therefore, In the Latin American “A” country, the expansion of exports and the influence of external demand due to the rise in primary commodity prices led to the expansion of the consumer finance market. In addition, the financial service expanded to the people who cannot take out a loan due to raising the poor to the middle class by income disparity correction policy implemented from 2003 to 2010 [2].

From the above, although the purchases increased by loans of motorcycles and cars [3], there were many customers who did not understand the contract contents of loans [4], and the rate of bad debts due to excessive debt consumption has raised [2].

In this research, we look for factors of bad debt from customer data.

2 The Data Overview

The data used in this study is anonymized customer data of motorcycles in Latin America “A” country from September 2010 to June 2012. The data is composed of the score of the credit agency A, the score of the credit agency B, the history of the tax, payment, the working year, the married/unmarried, the sex, the working state, the resident state, the main income, the side income, the dealer assessment, the division number, the down payment, the loan, interest amount, occupation, academic back-ground, house type, region, product type, displacement, size, 6 months Bad, 12 months Bad, 18 months Bad, and so on.

(6 months Bad, 12-month Bad, and 18-month Bad mean that if it doesn’t reach the price that the customer has to pay until the limits after purchasing, they will be checked. Therefore, a person who is eligible for Bad for 6 months Bad will be checked for 12-month Bad, and18-month Bad. It has never returned).

The number of customers was 14,304 in Latin America “A” country.

3 The Research Purpose

From the definition in the usage data summary, 18 months Bad customers are the most number in 6, 12, and 18 months Bad. From Fig. 1, it can be assumed the serious situation because about 20% of customers of all data are with Bad for 18 months. Due to the usage data contains many data items, it is necessary to extract variables that become a core of Bad for 18 months. In this study, we extract the variables which influence precisely 18 months bad AUC and analyze the influence of data extracted by logistic regression analysis. Based on the results, we grasp the characteristics of 18 months Bad customers. AUC is defined as the region below the ROC curve and it is an index for measuring the accuracy of the model. Therefore, AUC can be judged that the larger the numerical value of the region, the higher the accuracy.

Fig. 1.
figure 1

Per 6,12,18 months “Bad” transition of proportion and total

4 The Analysis

The flow of analysis is performed by following procedures; data cleaning, basic tabulation, grouping, AUC, and logistic regression analysis.

At first, we supplement missing data and remove with data cleaning. Next, the data trend is grasped by basic aggregation. Then we divide into some group by the amount of main income. In AUC, the variable affecting 18 months Bad is extracted by the area under the ROC curve. Finally, we examine how the variable extracted by AUC has an impact on 18 months Bad.

4.1 The Data Cleaning

At first, we complement the missing data. As there was no blank data on the score of credit agency B, we complemented the score of credit agency A and its main income by using the score of credit agency B. In addition, we removed the interest, the down payment, the borrowing money, the age, the blank data because of the variable after complementing the credit agency A and its income, and the customer data which is impossible to calculate. As the result, the number of customer data is 13,217.

We will explain the calculation method about the score of credit agency A. We use the score of credit agency B of the customer who has a blank in the score of credit agency A and calculate the average the score of both credit agency A and B. After that, we input the calculated the score of credit agency A in the blank data. Then we supplement the main income with the score of credit agency A as well.

4.2 The Basic Aggregate

According to Fig. 2, the percentage of Bad in the Midwest, Northeast, and Northern is high. In addition, the average of main income is lower in the regions with the higher rate of Bad. Next, we look at the trends in educational backgrounds and types of occupations that are relevant to “Bad” customer’s main income by region. View from Figs. 3 and 4. As a result, the same tendency was seen in all areas. The proportion of “Salary earners” in the classification of occupation and The proportion of “Graduated from Educational background 3” in the academic record was found to be large.

Fig. 2.
figure 2

By region percentage of main income and “Bad” ratio

Fig. 3.
figure 3

Percentage of type of occupation by region

Fig. 4.
figure 4

Percentage of type of educational background by region variable.

In addition, in the prior research, the score of credit agency A and the score of credit agency B was grouping done. The ratio of 18 months Bad was calculated using them, and as a result, Fig. 5 was listed. According to the score of the credit agency A, the lower the score value, the clearly the higher the proportion of 18 months Bad. Looking at the score of credit agency B, the lower the percentage of the score, the higher the proportion of 18 months Bad is, but the groups 0, 1, 2 are not cleaning.

Fig. 5.
figure 5

Percentage of “18 months Bad” by group of credit agency

4.3 The Grouping

It can be seen that there is a clear difference between the ratio of Bad in the north and the south. Moreover, it is considered that the bad factor is influenced by the main income, so we categorize groups by the main income for each revenue amount. The grouping criterion is classified by income which is published by the Ministry of Economy, Trade and Industry.

The A/B stratum get 7,475 or more a per month, the C stratum gets 1,734 to less than 7,435 a per month, D stratum gets less than 1,085 to 1,734 a per month, and E stratum gets less than 1,085 a per month.

Therefore, the A/B stratum is an affluent class, the C stratum is an intermediate class, and the D/E stratum is a poor class. We classified based on this criterion. From Table 3, it can be seen that the proportion of “Bad” in poor D and E stratum is high.

There is also a reason for summarizing the A/B stratum. There is also a reason for summarizing the A/B stratum. According to Fig. 6 the A/B stratum is a small part of the whole, and even this data is small number value [3] (Table 1).

Fig. 6.
figure 6

Population transition by stratum

Table 1. Number of data by income stratum and percentage of 18 months Bad

4.4 The Extracting Variables

As the data of 18 months after cleaning is an objective variable and other variables are explanatory variables, we analyze the explanatory variables by logistic regression analysis and obtain the predicted probability. After that, we obtain the area below the ROC curve based on the explanatory variables. As a result, the presence/absence of negative information in the customer list, the number of inquiries to the customer list, the age, the value of real estate, the rate of list price, the borrowing money, the interest amount, the score of credit agency A, the score of credit agency B, product A, product B, D stratum were adopted. Also, from the results of AUC, the score of credit agency A is 0.688, and the score of credit agency B is 0.594. From here the credit agency A’s score is strong relationship to 18 months Bad, indicating higher credibility (Table 2).

Table 2. Analysis result of ROC curve
Table 3. Analysis result of logistic regression analysis

4.5 Logistic Regression Analysis

We analyze the logistic regression analysis by using the adapted variables in 4.4. The logistics regression analysis is a way to predict occurrence probability. Based on the analysis, we judge the occurrence probability of Bad customers. According to Fig. 2, the variable whose Exp (B) value is 1 or more is the number of inquiries to the customer list, the ratio of the list price, the product B, and D stratum. The customers who apply to these variables tend to be “Bad” easily. And also, the variable whose Exp (B) value is less than 1 is the presence or absence of negative information of the customer list, and product A. The customers who apply to these variables might not tend to be “Bad” easily. In addition, the higher the score of the data, the easier to be “Bad” because the debt is numerical data, and the value of Exp (B) is more than one point. The interest amount, the score of the credit institution A, and the score of the credit institution B are numerical data as well. The lower the score of the data, the easier to be “Bad” (B) is less than one point.

5 Consideration

Based on this analysis result, we can mainly consider two points. One is comparison between product A and product B. Since purchasers of product B are likely to become 18 months Bad customers, it is considered that it is necessary to review customer data of purchasers of product B. Buyer of product A is difficult to become 18 months Bad customer. Therefore, it can be considered to expand the range of purchasers of product A.

Next, it is about factors that make it easier for D stratum customers to become 18 months Bad customers. It is considered to be due to the expansion of financial services as well. Also, the lowest E stratum is hard to become 18 months Bad customer because it can be thought that a loan was not originally constructed.

6 The Future Tasks

In this analysis, we identified the variables which have impact on 18 months Bad. However, the payment collection rate of the customers who are 6 months Bad and 12 months Bad is lower than the 18 months Bad because of the number of times of payment. Therefore, we analyze same things in 6 months Bad and 12 months Bad and extract the influenced variables. Moreover, we reanalyze from the adapted variables respectively. For example, using the decision tree analysis with this analysis method, we search for the customers who will be “Bad”, especially which variables are particularly applicable. And also, when we conduct the analysis, we analyze both cases that with the score of the credit institution “A” and “B” and not using them respectively. From the result, we can also judge whether a new credit risk model is necessary. We propose new evaluation method of credit risk model by these results.

In addition, the scores of the credit agency A are the most effective evaluation. The factors include basic aggregation, AUC results, and prior research. The content constituting the score of the credit organization A is unclear. Therefore, it is necessary to grasp the characteristics of the score of the credit agency A.