Analysis of the Characteristic Behavior of Loyal Customers on a Golf EC Site

Su, Yue; Otake, Kohei; Namatame, Takashi

doi:10.1007/978-3-030-21905-5_37

Yue Su⁹,
Kohei Otake¹⁰ &
Takashi Namatame¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11579))

Included in the following conference series:

International Conference on Human-Computer Interaction

3518 Accesses

Abstract

In recent years, with expansion and growth of electronic commerce (EC) market, it is expected that the competition of getting customers will be fierce. The EC company is required to find new customers who have the potential of becoming loyal customers as soon as possible. In this study, we analyze customers’ behavior using customer membership information data, purchase records data and web access logs data on a golf EC site. Firstly, we evaluate the loyalty of customers using RFM analysis to divide customers into the loyal and general ones. Next, we perform logistic regression to discriminate loyalty by using the first-time purchase and browsing behaviors. Through our analysis, we built a model to predict loyal customers and clarify the characteristic behaviors of high loyal customers.

You have full access to this open access chapter, Download conference paper PDF

Analysis of the Characteristics of Repeat Customer in a Golf EC Site

Predictive analytics using big data for increased customer loyalty: Syriatel Telecom Company case study

Article Open access 23 April 2020

Consumer Loyalty Factors in the Russian E-Commerce Market

Keywords

1 Introduction

In recent years, electronic commerce (hereinafter called “EC”) continues to evolve at a rapid pace [1]. With expansion and growth of the EC market, it is expected that the competition of getting customers will be fierce. Choosing appropriate target customers is very important for expanding sales and improving profitability.

Therefore, the EC company is required to find new customers who have the potential of becoming loyal customers as soon as possible. Here, the first purchase date can be considered a point. We look forward to the common behaviors of these customers in their initial purchases. Customers raise customer satisfaction, so that companies improve sales and profits. It is desirable to have such a relationship between both sides that can benefit from each other.

Figure 1 shows the framework of customers hierarchy. First, customers visit the website. Upper-level customer purchase frequently and high amount. Then, finding these loyal customers and developing new loyal customers are very important strategies for the retail company.

In this study, we focused on new customers and the purpose is to clarify the characteristic behaviors of high loyal customers using customer’s membership information data, purchase data and access historical data.

2 Datasets

We target on a general electronic commerce website (hereinafter called “EC site”) relating to golf. The EC site provides some services such as EC of golf equipment, reservations for golf courses, manage golf score, etc. From among these services, we used the following data.

Customer information data (age, sex, registration date, etc.)
Purchase history data (category of purchase items, purchase date, whether purchased item is brand-new or secondhand, etc.)
Access history data (log in date and time, URL of access page, URL of referrer page, etc.)

The category name of the product included in the purchase data is shown in Table 1.

Table 1. Category name of item

Full size table

Target Customer

In this study, we analyzed 5,553 customers who purchased for the first time from May 1, 2015, to July 30, 2015, and purchased more than twice a year from the initial purchase date. We exclude the customer who has passed for more than one year from registration.

In Fig. 2, we show the target period used in this research.

Explanatory Variables

We considered the impact factors to the first purchase using the above data. Based on the result, we created the explanatory variables such as customer’s member information (5 variables), purchasing behavior at the time of initial purchase (11 variables) and web browsing behavior at the initial purchase date (13 variables) [4].

Details of the explanatory variables are shown in Tables 2, 3 and 4.

Table 2. Demographic variables used in the model construction.

Full size table

Table 3. Purchasing behavior used in the model construction.

Full size table

Table 4. Access history variables used in the model construction.

Full size table

Table 2 presents demographic variables created by membership information data.

Table 3 demonstrates purchasing behavior variables created by purchase data.

Table 4 shows Access History Variables created by web browsing data.

3 Analysis of Loyal Customer

In this study, we analyze the behavior of the initial order date for customers who purchase more than once a year using customer membership information data, purchase records data and web access logs data on a golf EC site.

As an analysis, firstly we evaluated customer loyalty for new customers by RFM analysis. We determined customers’ loyalties with three purchasing behavior indicators (Recency, Frequency, Monetary) and categorized them as loyal customers and general customers based on this.

Next, we created variables related to the initial purchase and exploratory behavior and constructed a discrimination model of customer loyalty by logistic regression analysis. Through these analyses, we worked to grasp the characteristics of customers with high loyalties at the initial order date.

3.1 RFM Analysis

RFM analysis is one of the most common approaches in database marketing. RFM analysis is a proven marketing model for behavior-based customer segmentation. It groups customers on recency, frequency, and monetary value can indicate customer.

RFM analysis segments customers on recency, frequency, and monetary value can indicate customer We evaluated the loyalty of customers using RFM analysis to divide customers into loyal and general ones [2]. Commonly, the F in RFM analysis is determined by the number of purchases. Here, we defined F by the total number of logins instead of the number of purchase, because frequent browsing behavior is also relates to customer’s loyalty for the website.

RFM stands for the three dimensions:

Recency: Period since last purchase
Frequency: Total number of logins within the period
Monetary: Amount of purchase within the period

The approach to RFM is to assign a score for each dimension on a scale from 1 to 5. The maximum score represents the preferred behavior.

Customers are divided into five scales equally for each of recency, frequency, monetary. The maximum score of RFM stands for the three dimensions:

Recency: The maximum score (5) represents the shortest number of days that have passed since the customer last purchased within a year.
Frequency: The maximum score (5) represents the longest number of logins within a year.
Monetary: The maximum score (5) represents the highest value of all purchases within a year.

3.2 Binomial Logistic Regression

The purpose of this study is to predict the high loyal customers by using the initial purchase and browsing behaviors. When the objective variable to be predicted is binary, binomial logistic regression models are often used.

The Binomial logistic regression model is a type of classifier that performs class discrimination. By interpreting significant explanatory variables in the constructed model, it is possible to clarify the characteristics that affect the presence or absence of repurchase. In the binomial logistic regression analysis, the customer’s repurchase probability p_i is expressed by the following equation [3].

$$ p_{i} = \frac{{\exp \left\{ {\sum\nolimits_{j = 0}^{m} {\beta_{j} X_{ij} } } \right\}}}{{1 + \exp \left\{ {\sum\nolimits_{j = 0}^{m} {\beta_{j} X_{ij} } } \right\}}} $$

(1)

$ X_{ij} : $ Factors affecting repurchase ($ X_{i0} = 1) $
$ \beta_{j} : $ Parameters for each explanatory variable ($ \beta_{0} $ is intercept)

We prepared variables related to demographic variables, initial purchase behavior and exploratory behavior (Tables 2, 3 and 4) and constructed a discrimination model of customer loyalty by binomial logistic regression analysis. Here, we label the loyal customer as 1, and the general customer as 0.

In logistic regression analysis, when the explanatory variable is excessive, it may be difficult to interpret the regression equation, or the versatility of prediction of the objective variable may decrease. It may occur multicollinearity problem due to some variables have a high correlation. Therefore, in this study, to select true effective variables, we used stepwise method based on Akaike’s Information Criterion (AIC).

In order to confirm the discrimination accuracy of the model, we divided the data used in the logistic regression analysis into two groups (Group A, Group B), and performed a 2-fold cross-validation method.

The cross-validation method is mainly used in settings where the purpose is a prediction, and one wants to estimate how accurately a predictive model will perform in practice.

In order to confirm the prediction accuracy of the constructed model, we performed hold-out validation by using the training data and test data. Specifically, we created a confusion matrix like Table 5 and we calculated prediction accuracy of the constructed model by using the following equations.

Table 5. Confusion matrix

Full size table

Accuracy (ACC): Percentage of the total number correctly predicted among the total number predicted.

$$ {\text{ACC}} = \frac{TP + TN}{FP + FN + TP + TN} $$

(2)

Precision (PRE): Percentage of the total number that is a positive class actually among the total number predicted positive class.

$$ {\text{PRE}} = \frac{TP}{TP + FP} $$

(3)

Recall (REC): Percentage of the total number predicted positive class among the total number that is a positive class actually

$$ {\text{REC}} = \frac{TP}{FN + TP} $$

(4)

F-measure: harmonic mean of PRE and REC

$$ {\text{F-measure}} = 2 \times \frac{PRE \times REC}{PRE + REC} $$

(5)

4 Results and Discussions

In this section, we show our analyzing results and discuss them.

4.1 RFM Analysis

Customers were divided into five equal scales equally for each of recency, frequency, monetary. Categories for each attribute of RFM are shown in Table 6.

Table 6. Categories for each attribute of RFM

Full size table

Although the number of target customers in this research was 5,553, at the time of model construction, we randomly sampled the number of general customers by setting the number equal to the number of loyal customers.

The number of datasets (Group A, Group B) used in these model constructions are shown Table 7.

Table 7. Datasets used in prediction model

Full size table

4.2 Binomial Logistic Regression

In each iteration, the model will be fit to one group of the data, and used to predict the other group.

We built two models that predicts loyal customer for the customers using binomial logistic regression analysis with AIC based the stepwise selection method.

The evaluation indicator for confirming the prediction accuracy are shown Table 8.

Table 8. Evaluation indicator of model for customers (%)

Full size table

Both models are over accuracies. Since the conventional researches on the EC site had the accuracies about 60%, it can be said that this research gained sufficient prediction accuracy.

The accuracy is high when group A is used as training data. Table 9 shows the partial regression coefficients.

Table 9. Partial regression coefficients.

Full size table

There are 11 variables selected from 29 candidate variables.

From Table 9, we can see that variables created from purchase data are selected much. In addition, the confusion matrix for the test data of this model is shown in Table 10.

Table 10. Confusion matrix of model for customers

Full size table

4.3 Discussions

We selected the explanatory variables which the coefficient of the significant probability of less than 0.05. There are 8 explanatory variables selected (Table 11).

Table 11. Estimated value of selected partial regression coefficient

Full size table

Overall, since all the partial regression coefficients are positive numbers, it was found that the higher the value of all the selected variables, the more likely to become loyal customers.

In all the variables, total number of items purchased at the initial order date is the highest partial regression coefficient. It seems that the loyalties will be improved by raising customer satisfaction such as giving coupons or gifts to customers with high purchase quantities at the initial order date.

Since partial regression coefficient of “Whether the member registration date matched the initial order date or not” is positive as well, we considered that customers who were interested for a long time and took a long time to purchase. From this result, it seems that recommendations of similar items promote purchase.

It seems that recommending the items of men’s wear, golf club, accessory on sale items to the customers registered as a member and did not purchase leads to promotion of purchasing.

It is considered that it is necessary to improve the loyalty of customers by recommending goods to be compared without limiting prices at the initial purchase.

4.4 Verification

We verified with the data of the same period two years later using the prediction model built this time. The results are shown in Tables 12 and 13.

Table 12. Confusion matrix of model for customers

Full size table

Table 13. Evaluation indicator of model for customers (%)

Full size table

Here, although high prediction accuracy was obtained, the precision was low. It is considered that this model distinguishes loyal customers and general customers well, but it could not confirm loyal customers correctly.

5 Conclusion

In this study, we determined customers’ loyalties by RFM analysis and constructed a discrimination model of customer loyalty by logistic regression analysis to find characteristic behavior of loyal customers on a golf EC site.

Through our analyses, we built a useful model to predict loyal customers using the web access logs and purchase records data at initial purchase on a golf EC site. As a result, we could clarify the initial purchase and browsing behavior of high loyal customers and tried to propose marketing measures. Even for the data after two years, the model we made this time got a high accuracy.

However, we are conducting a prediction from the data at one point in this study. It is important to check the prediction accuracy of loyal customers by analyzing the data at the transition time.

References

Ministry of Economy, Trade and Industry: Foundation for Data-Driven Society in Japan (Market Survey on Electronic Commerce) (2018). (in Japanese)
Google Scholar
Nakamura, H. (ed.): Market Segmentation - Discovery of Sales Opportunities Using Purchase History Data, Hakuto Shobo (2008). (in Japanese)
Google Scholar
Yamashita, H., Suzuki, H.: Analysis of purchasing behavior of customers focusing on sale items: logistic regression analysis with consideration of clustering of binary data. Commun. Oper. Res. Soc. Jpn. 60(2), 81–88 (2015). (in Japanese)
Google Scholar
Sato, Y., Namatame, T., Otake, K.: Analysis of the characteristics of repeat customer in a golf EC site. In: International Conference on Social Computing and Social Media, SCSM 2017: Social Computing and Social Media. Human Behavior, pp. 223–233 (2017)
Google Scholar

Download references

Acknowledgment

We thank Golf Digest Online Inc. for permission to use valuable datasets and for useful comments. This work was supported by JSPS KAKENHI Grant Number 16K03944 and 17K13809.

Author information

Authors and Affiliations

Graduate School of Science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Yue Su
School of Information and Telecommunication Engineering, Tokai University, 2-3-23, Takanawa, Minato-ku, Tokyo, 108-8619, Japan
Kohei Otake
Faculty of Science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Takashi Namatame

Authors

Yue Su
View author publications
You can also search for this author in PubMed Google Scholar
Kohei Otake
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Namatame
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Su .

Editor information

Editors and Affiliations

Computer Science, Towson University, Towson, MD, USA
Gabriele Meiselwitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, Y., Otake, K., Namatame, T. (2019). Analysis of the Characteristic Behavior of Loyal Customers on a Golf EC Site. In: Meiselwitz, G. (eds) Social Computing and Social Media. Communication and Social Communities. HCII 2019. Lecture Notes in Computer Science(), vol 11579. Springer, Cham. https://doi.org/10.1007/978-3-030-21905-5_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-21905-5_37
Published: 12 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21904-8
Online ISBN: 978-3-030-21905-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analysis of the Characteristic Behavior of Loyal Customers on a Golf EC Site

Abstract

Similar content being viewed by others

Analysis of the Characteristics of Repeat Customer in a Golf EC Site

Predictive analytics using big data for increased customer loyalty: Syriatel Telecom Company case study

Consumer Loyalty Factors in the Russian E-Commerce Market

Keywords

1 Introduction

2 Datasets

Target Customer

Explanatory Variables

3 Analysis of Loyal Customer

3.1 RFM Analysis

3.2 Binomial Logistic Regression

4 Results and Discussions

4.1 RFM Analysis

4.2 Binomial Logistic Regression

4.3 Discussions

4.4 Verification

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analysis of the Characteristic Behavior of Loyal Customers on a Golf EC Site

Abstract

Similar content being viewed by others

Analysis of the Characteristics of Repeat Customer in a Golf EC Site

Predictive analytics using big data for increased customer loyalty: Syriatel Telecom Company case study

Consumer Loyalty Factors in the Russian E-Commerce Market

Keywords

1 Introduction

2 Datasets

Target Customer

Explanatory Variables

3 Analysis of Loyal Customer

3.1 RFM Analysis

3.2 Binomial Logistic Regression

4 Results and Discussions

4.1 RFM Analysis

4.2 Binomial Logistic Regression

4.3 Discussions

4.4 Verification

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation