Abstract
In recent years, with expansion and growth of electronic commerce (EC) market, it is expected that the competition of getting customers will be fierce. The EC company is required to find new customers who have the potential of becoming loyal customers as soon as possible. In this study, we analyze customers’ behavior using customer membership information data, purchase records data and web access logs data on a golf EC site. Firstly, we evaluate the loyalty of customers using RFM analysis to divide customers into the loyal and general ones. Next, we perform logistic regression to discriminate loyalty by using the first-time purchase and browsing behaviors. Through our analysis, we built a model to predict loyal customers and clarify the characteristic behaviors of high loyal customers.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, electronic commerce (hereinafter called “EC”) continues to evolve at a rapid pace [1]. With expansion and growth of the EC market, it is expected that the competition of getting customers will be fierce. Choosing appropriate target customers is very important for expanding sales and improving profitability.
Therefore, the EC company is required to find new customers who have the potential of becoming loyal customers as soon as possible. Here, the first purchase date can be considered a point. We look forward to the common behaviors of these customers in their initial purchases. Customers raise customer satisfaction, so that companies improve sales and profits. It is desirable to have such a relationship between both sides that can benefit from each other.
Figure 1 shows the framework of customers hierarchy. First, customers visit the website. Upper-level customer purchase frequently and high amount. Then, finding these loyal customers and developing new loyal customers are very important strategies for the retail company.
In this study, we focused on new customers and the purpose is to clarify the characteristic behaviors of high loyal customers using customer’s membership information data, purchase data and access historical data.
2 Datasets
We target on a general electronic commerce website (hereinafter called “EC site”) relating to golf. The EC site provides some services such as EC of golf equipment, reservations for golf courses, manage golf score, etc. From among these services, we used the following data.
-
Customer information data (age, sex, registration date, etc.)
-
Purchase history data (category of purchase items, purchase date, whether purchased item is brand-new or secondhand, etc.)
-
Access history data (log in date and time, URL of access page, URL of referrer page, etc.)
The category name of the product included in the purchase data is shown in Table 1.
Target Customer
In this study, we analyzed 5,553 customers who purchased for the first time from May 1, 2015, to July 30, 2015, and purchased more than twice a year from the initial purchase date. We exclude the customer who has passed for more than one year from registration.
In Fig. 2, we show the target period used in this research.
Explanatory Variables
We considered the impact factors to the first purchase using the above data. Based on the result, we created the explanatory variables such as customer’s member information (5 variables), purchasing behavior at the time of initial purchase (11 variables) and web browsing behavior at the initial purchase date (13 variables) [4].
Details of the explanatory variables are shown in Tables 2, 3 and 4.
Table 2 presents demographic variables created by membership information data.
Table 3 demonstrates purchasing behavior variables created by purchase data.
Table 4 shows Access History Variables created by web browsing data.
3 Analysis of Loyal Customer
In this study, we analyze the behavior of the initial order date for customers who purchase more than once a year using customer membership information data, purchase records data and web access logs data on a golf EC site.
As an analysis, firstly we evaluated customer loyalty for new customers by RFM analysis. We determined customers’ loyalties with three purchasing behavior indicators (Recency, Frequency, Monetary) and categorized them as loyal customers and general customers based on this.
Next, we created variables related to the initial purchase and exploratory behavior and constructed a discrimination model of customer loyalty by logistic regression analysis. Through these analyses, we worked to grasp the characteristics of customers with high loyalties at the initial order date.
3.1 RFM Analysis
RFM analysis is one of the most common approaches in database marketing. RFM analysis is a proven marketing model for behavior-based customer segmentation. It groups customers on recency, frequency, and monetary value can indicate customer.
RFM analysis segments customers on recency, frequency, and monetary value can indicate customer We evaluated the loyalty of customers using RFM analysis to divide customers into loyal and general ones [2]. Commonly, the F in RFM analysis is determined by the number of purchases. Here, we defined F by the total number of logins instead of the number of purchase, because frequent browsing behavior is also relates to customer’s loyalty for the website.
RFM stands for the three dimensions:
-
Recency: Period since last purchase
-
Frequency: Total number of logins within the period
-
Monetary: Amount of purchase within the period
The approach to RFM is to assign a score for each dimension on a scale from 1 to 5. The maximum score represents the preferred behavior.
Customers are divided into five scales equally for each of recency, frequency, monetary. The maximum score of RFM stands for the three dimensions:
-
Recency: The maximum score (5) represents the shortest number of days that have passed since the customer last purchased within a year.
-
Frequency: The maximum score (5) represents the longest number of logins within a year.
-
Monetary: The maximum score (5) represents the highest value of all purchases within a year.
3.2 Binomial Logistic Regression
The purpose of this study is to predict the high loyal customers by using the initial purchase and browsing behaviors. When the objective variable to be predicted is binary, binomial logistic regression models are often used.
The Binomial logistic regression model is a type of classifier that performs class discrimination. By interpreting significant explanatory variables in the constructed model, it is possible to clarify the characteristics that affect the presence or absence of repurchase. In the binomial logistic regression analysis, the customer’s repurchase probability pi is expressed by the following equation [3].
-
\( X_{ij} : \) Factors affecting repurchase (\( X_{i0} = 1) \)
-
\( \beta_{j} : \) Parameters for each explanatory variable (\( \beta_{0} \) is intercept)
We prepared variables related to demographic variables, initial purchase behavior and exploratory behavior (Tables 2, 3 and 4) and constructed a discrimination model of customer loyalty by binomial logistic regression analysis. Here, we label the loyal customer as 1, and the general customer as 0.
In logistic regression analysis, when the explanatory variable is excessive, it may be difficult to interpret the regression equation, or the versatility of prediction of the objective variable may decrease. It may occur multicollinearity problem due to some variables have a high correlation. Therefore, in this study, to select true effective variables, we used stepwise method based on Akaike’s Information Criterion (AIC).
In order to confirm the discrimination accuracy of the model, we divided the data used in the logistic regression analysis into two groups (Group A, Group B), and performed a 2-fold cross-validation method.
The cross-validation method is mainly used in settings where the purpose is a prediction, and one wants to estimate how accurately a predictive model will perform in practice.
In order to confirm the prediction accuracy of the constructed model, we performed hold-out validation by using the training data and test data. Specifically, we created a confusion matrix like Table 5 and we calculated prediction accuracy of the constructed model by using the following equations.
Accuracy (ACC): Percentage of the total number correctly predicted among the total number predicted.
Precision (PRE): Percentage of the total number that is a positive class actually among the total number predicted positive class.
Recall (REC): Percentage of the total number predicted positive class among the total number that is a positive class actually
F-measure: harmonic mean of PRE and REC
4 Results and Discussions
In this section, we show our analyzing results and discuss them.
4.1 RFM Analysis
Customers were divided into five equal scales equally for each of recency, frequency, monetary. Categories for each attribute of RFM are shown in Table 6.
Although the number of target customers in this research was 5,553, at the time of model construction, we randomly sampled the number of general customers by setting the number equal to the number of loyal customers.
The number of datasets (Group A, Group B) used in these model constructions are shown Table 7.
4.2 Binomial Logistic Regression
In each iteration, the model will be fit to one group of the data, and used to predict the other group.
We built two models that predicts loyal customer for the customers using binomial logistic regression analysis with AIC based the stepwise selection method.
The evaluation indicator for confirming the prediction accuracy are shown Table 8.
Both models are over accuracies. Since the conventional researches on the EC site had the accuracies about 60%, it can be said that this research gained sufficient prediction accuracy.
The accuracy is high when group A is used as training data. Table 9 shows the partial regression coefficients.
There are 11 variables selected from 29 candidate variables.
From Table 9, we can see that variables created from purchase data are selected much. In addition, the confusion matrix for the test data of this model is shown in Table 10.
4.3 Discussions
We selected the explanatory variables which the coefficient of the significant probability of less than 0.05. There are 8 explanatory variables selected (Table 11).
Overall, since all the partial regression coefficients are positive numbers, it was found that the higher the value of all the selected variables, the more likely to become loyal customers.
In all the variables, total number of items purchased at the initial order date is the highest partial regression coefficient. It seems that the loyalties will be improved by raising customer satisfaction such as giving coupons or gifts to customers with high purchase quantities at the initial order date.
Since partial regression coefficient of “Whether the member registration date matched the initial order date or not” is positive as well, we considered that customers who were interested for a long time and took a long time to purchase. From this result, it seems that recommendations of similar items promote purchase.
It seems that recommending the items of men’s wear, golf club, accessory on sale items to the customers registered as a member and did not purchase leads to promotion of purchasing.
It is considered that it is necessary to improve the loyalty of customers by recommending goods to be compared without limiting prices at the initial purchase.
4.4 Verification
We verified with the data of the same period two years later using the prediction model built this time. The results are shown in Tables 12 and 13.
Here, although high prediction accuracy was obtained, the precision was low. It is considered that this model distinguishes loyal customers and general customers well, but it could not confirm loyal customers correctly.
5 Conclusion
In this study, we determined customers’ loyalties by RFM analysis and constructed a discrimination model of customer loyalty by logistic regression analysis to find characteristic behavior of loyal customers on a golf EC site.
Through our analyses, we built a useful model to predict loyal customers using the web access logs and purchase records data at initial purchase on a golf EC site. As a result, we could clarify the initial purchase and browsing behavior of high loyal customers and tried to propose marketing measures. Even for the data after two years, the model we made this time got a high accuracy.
However, we are conducting a prediction from the data at one point in this study. It is important to check the prediction accuracy of loyal customers by analyzing the data at the transition time.
References
Ministry of Economy, Trade and Industry: Foundation for Data-Driven Society in Japan (Market Survey on Electronic Commerce) (2018). (in Japanese)
Nakamura, H. (ed.): Market Segmentation - Discovery of Sales Opportunities Using Purchase History Data, Hakuto Shobo (2008). (in Japanese)
Yamashita, H., Suzuki, H.: Analysis of purchasing behavior of customers focusing on sale items: logistic regression analysis with consideration of clustering of binary data. Commun. Oper. Res. Soc. Jpn. 60(2), 81–88 (2015). (in Japanese)
Sato, Y., Namatame, T., Otake, K.: Analysis of the characteristics of repeat customer in a golf EC site. In: International Conference on Social Computing and Social Media, SCSM 2017: Social Computing and Social Media. Human Behavior, pp. 223–233 (2017)
Acknowledgment
We thank Golf Digest Online Inc. for permission to use valuable datasets and for useful comments. This work was supported by JSPS KAKENHI Grant Number 16K03944 and 17K13809.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Su, Y., Otake, K., Namatame, T. (2019). Analysis of the Characteristic Behavior of Loyal Customers on a Golf EC Site. In: Meiselwitz, G. (eds) Social Computing and Social Media. Communication and Social Communities. HCII 2019. Lecture Notes in Computer Science(), vol 11579. Springer, Cham. https://doi.org/10.1007/978-3-030-21905-5_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-21905-5_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21904-8
Online ISBN: 978-3-030-21905-5
eBook Packages: Computer ScienceComputer Science (R0)