Purchase and Its Sign Analysis from Customer Behaviors Using Deep Convolutional Neural Networks

Saito, Shintaro; Otake, Kohei; Namatame, Takashi

doi:10.1007/978-3-030-21905-5_36

Shintaro Saito⁹,
Kohei Otake¹⁰ &
Takashi Namatame⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11579))

Included in the following conference series:

International Conference on Human-Computer Interaction

3506 Accesses

Abstract

In this paper, we predict purchase from access log data and consider customer behavior about purchase sign on E-commerce site. In addition, applying Convolutional neural networks to this study, we discuss probability of purchase or not purchase and data preprocessing. By extracting hidden layer of model, we consider customer behaviors about purchase and not purchase. Furthermore, we discuss the way of transforming no image datasets to image-like data. We think about probability of using its networks for no images data through this study.

You have full access to this open access chapter, Download conference paper PDF

Analysis and Prediction of Purchase Intention of Online Customers with Deep Learning

Forecasting Purchase Categories by Transactional Data: A Comparative Study of Classification Methods

A Comparative Study of Data Augmentation Methods for Brand Logo Classifiers

Keywords

1 Introduction

Recently, we become able to access detail customer data on electronic-commerce (EC) site and can analyze customer behavior more than before. In particular, using access log data, it extends the possibility of analyzing customer behavior and it become express customer information searching behavior. In this paper, mainly using web access log data, we predict purchasing and related behaviors in EC.

Nowadays, neural network or its extended method is applied to explain various business objectives. There are many types of research of applying deep learning to various tasks based on the improvement of computer performance. Especially, convolutional neural networks (CNN) which is often used in image recognition is applied to the churn analyzing [1]. AlphaZero might be the most famous among them [2]. For classification or predicting tasks, most of the deep learning models have a better performance than the conventional machine learning approaches (e.g. Support Vector Machine (SVM), logistic regression model), while there is a problem about incapable of explaining how the models decide for these tasks. In this study, we attempt to find features of time-series customer access log data by convolutional neural networks and consider purchasing sign from hidden layer state. Furthermore, we discuss transforming datasets of no images to image-like data.

2 Datasets

In this study, we use purchase record and web access log data on a golf EC site. Aggregating session log data by the users for days in a month, we make datasets of 30 rows and 11 columns for users. 11 columns are types of access pages which are new item pages, old item pages, news pages, and so on. Also, we label purchasing or not purchasing in the next month as supervised learning. We use various datasets, locate features in a certain order, in order of high correlation with purchasing or rearrangement placing features in some rules. These are enumerated in Tables 1 and 2 (Fig. 1).

Table 1. Locating features samples

Full size table

Table 2. Correlations of features about purchase

Full size table

3 Methods

3.1 Convolutional Neural Networks

Convolutional neural networks (CNN), which is one of Deep Learning method have some convolutional and pooling layers. In convolutional layers, it is filtered small images with numbers to extract features, and it transform a certain area of images into a rough feature. In pooling layers, features which are mapped in convolutional layers are additionally summarized in maximum or average feature. After convolutional layers, we can grasp features through activation layers. Activation layers put them into a certain state. Then, we get output values from networks, then do back propagation which calculates errors of every unit in layers. These layers can make feature maps from images. This algorithm is as follows.

$$ \left\{ {\begin{array}{*{20}c} {u_{m,n,k}^{l + 1} } \\ {z_{{m,n,k^{\prime}}}^{l} } \\ \end{array} } \right.\begin{array}{*{20}c} = \\ = \\ \end{array} \begin{array}{*{20}l} {\sum\nolimits_{p,q,k} {w_{{p,q,k,k^{\prime}}}^{l + 1} \,} y_{m + p,n + q,k}^{l} + b_{{k^{\prime}}}^{l + 1} } \hfill \\ {h\left( {u_{m,n,k}^{l + 1} } \right)} \hfill \\ \end{array} $$

(1)

$ where\left\{ {\begin{array}{*{20}c} {k:chanel} \\ {k^{\prime}:kernel} \\ {m,n:length\,of\,row\,and\,column} \\ {p,q:size\,of\,filta} \\ {p = \{ 1,2, \ldots ,P^{l + 1} \} } \\ {q = \{ 1,2, \ldots ,Q^{l + 1} \} } \\ {k = \{ 1,2, \ldots ,K^{l} \} } \\ {k^{\prime } = \{ 1,2, \ldots ,K^{l + 1} \} } \\ \end{array} } \right. $

$$ \left\{ {\begin{array}{*{20}l} {u_{m,n,k}^{l + 1} :output\,of\,(m,n)\,pixel\,at\,(l + 1)layer\,in\,k^{\prime } \,chanel} \hfill \\ {w_{{p,q,k,k^{\prime}}}^{l + 1} :weight\,of\,(p,q)\,pixel\,at\,(l + 1)layer\,and\,k\,chanel\,in\,k^{\prime } \,kernel} \hfill \\ {z_{{m,n,k^{\prime}}}^{l} :output\,of\,(m + p,n + q)\,pixel\,at\,l\,layer\,in\,k\,chanel} \hfill \\ {b_{{k^{\prime}}}^{l + 1} :bias\,every\,chanel} \hfill \\ {h(x):activation\,function} \hfill \\ \end{array} } \right. $$

Every filter of convolution layer is determined in some rules. In this study, we determined the weights of filters from He normal [5]. When n is the number of filter pixel, the weights are generated from normal distribution that mean is 0, and variance is 2/n.

There are some activation functions, for example Relu, sigmoid, Leakly Relu. They are as shown below in Fig. 2. The architecture of CNN is shown like in Fig. 3.

The back propagation is as follows.

$$ w_{t} = w_{t - 1} -\upeta\frac{dE}{dw} $$

(2)

$$ \frac{dE}{dw} = \frac{dE}{dy} \frac{dy}{dt} \frac{dt}{dw} $$

(3)

$$ {\text{Where}}\left\{ {\begin{array}{*{20}l} {E = - \sum\nolimits_{n = 1} {\log (y_{correct\;label\;at\;n\;sanple}^{[n]} )} } \hfill \\ {\frac{dy}{dt} = f^{\prime } (t)} \hfill \\ {\frac{dt}{dw} = u} \hfill \\ {f\left( t \right):activation\;function} \hfill \\ {u :output\;of\;hidden\;layer} \hfill \\ \end{array} } \right. $$

We get new weight when this calculate is repeated form output layer to input layer.

3.2 Residual Network (Resnet)

There are famous model structures of CNN, Resnet that won first place in ILSVRC 2015 [3] is one of them. It is very important that the depth of convolution layers for CNN. The more deep layers are, the more accurately CNN is because it can extract feature maps from convolution layers. However, it is said that the accuracy of many convolution layers model is more worse than not it. In fact, 20 convolution layers model is more accurate than 56 convolution layers model [3]. Resnet solved its problem. Residual learning uses previous input for every residual block. It is shown like in Fig. 4.

Resnet has some residual blocks in model, then Resnet can become more a accurate model. Using residual blocks, Resnet helps CNN can have many convolution layers, it is shown like in Fig. 5 [3].

We use Resnet architecture in this study. This paper constructed a convolution layer in first convolution layer and final it, and 9 residual blocks like left on Fig. 3 every residual block have batch normalization layers after convolution layers. The filter size of every residual blocks is 3 × 3, channels are 64, training batch size is 64, and so on. To summarize, the CNN architecture is as follows Table 3.

Table 3. The architecture of CNN

Full size table

4 Data Preprocessing

Access log data is very sparse data, so in order to discriminate purchase data, we transform every pixel value into new it of subtracting it from 256. In addition, we produced impulse noise and median filter data from purchase and no purchase data to increase patterns of train data. In the end, we standardized all data. Lastly, we separate all data into train and test data, and make validation data from train data.

5 Modeling

We get the result from fitting the train and test data. These results are shown like in Table 4.

Table 4. Score of model

Full size table

Datasets 1: not adding noise to train data in order of high correlation
Datasets 2: adding impulse noise and median filter to train data in order of high correlation
Datasets $ 2^{\prime } $: adding impulse noise and median filter to train data in order of high correlation from side to side
Datasets 3: adding impulse noise to train data in order of high correlation

6 Discussion

First, we discuss the datasets. As you can see from datasets 2 and datasets 2′ of Table 4, the best way of locating feature is that every feature locates in high correlation. Furthermore, adding noise to train data is good way for modeling. It is based on datasets 1 of Table 4. Also, adding median filter is not so good preprocessing to increase the data patterns. However, impulse noise is good preprocessing for the image-like data of customer behaviors. We can say that it is good for the discrimination. In short, for the discrimination of image-like data of customer behaviors, impulse noise is should be added to the data which is sparse data.

Next, we discuss purchase behaviors. From here on, the data sets are regarded as Data sets 3. We extracted feature mapping from hidden layers of maximum probability purchasing or not purchasing and minimum probability purchasing or not purchasing. They are shown like in Figs. 6, 7, 8 and 9.

It shows how networks extract features from inputted customer behaviors log data. Using this figure, we can monitor the purchasing or not purchasing trend. We use CNN for mapping time-series customer behaviors in a month, and grasp purchase or not purchase sign. Each pixel expresses the value when the color is near blue, the active value is high and red is low. As you can see Figs. 6 and 7, customer behaviors of purchase class are time-series features in left edge. Namely, when time line customer behaviors are active about news pages outlet pages, gear pages, and old item pages, the possibility of purchase is high. On the other hands, there are active behaviors somewhere in a month, the possibility of purchase is low. Considering kinds of pages about max possibility of purchase, the time-series activity of customers about news pages or outlet pages is thought to be purchase sign. And, customers behaviors are active at the end of month, possibility of purchase is low. In terms of not purchasing class, when customer behaviors are active about such new item pages or sale pages, the possibility is low. They may use this E-commerce site for viewing products. Customers for minimum probability of not purchase are active about center of data. Thus, they are active users for reserving golf course and do not use this site not E-commerce site as reserving site.

7 Conclusion

In this study, we proposed a CNN model to predict purchase and grasp its sign. Especially, to visualize hidden layers, we could grasp customer behavior from time-series its access log data by using CNN. So, this study expanded ability of applying CNN to no images discrimination. However, some studies are remaining, for example, more interpreting the characteristics of hidden layers or improving model accuracy. These are our future works.

References

Wangperawong, A., Brun, C., Laudy, O., Pavasuthipaisit, R.: Churn analysis using deep convolutional neural networks and autoencoders. Cornell University (2016). arXiv:1604.05377
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Proc. Nat. 529, 484–489 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Neural Network - Deep Learning image source for presentation or seminar. http://nkdkccmbr.hateblo.jp/entry/2016/10/06/222245. 31 Jan 2019
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV, pp. 1026–1034 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and System Engineering, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Shintaro Saito & Takashi Namatame
Department of Management Systems Engineering, Tokai University, 2-3-23, Takanawa, Minato-ku, Tokyo, 108-0074, Japan
Kohei Otake

Authors

Shintaro Saito
View author publications
You can also search for this author in PubMed Google Scholar
Kohei Otake
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Namatame
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shintaro Saito .

Editor information

Editors and Affiliations

Computer Science, Towson University, Towson, MD, USA
Gabriele Meiselwitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saito, S., Otake, K., Namatame, T. (2019). Purchase and Its Sign Analysis from Customer Behaviors Using Deep Convolutional Neural Networks. In: Meiselwitz, G. (eds) Social Computing and Social Media. Communication and Social Communities. HCII 2019. Lecture Notes in Computer Science(), vol 11579. Springer, Cham. https://doi.org/10.1007/978-3-030-21905-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-21905-5_36
Published: 12 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21904-8
Online ISBN: 978-3-030-21905-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Purchase and Its Sign Analysis from Customer Behaviors Using Deep Convolutional Neural Networks

Abstract

Similar content being viewed by others

Analysis and Prediction of Purchase Intention of Online Customers with Deep Learning

Forecasting Purchase Categories by Transactional Data: A Comparative Study of Classification Methods

A Comparative Study of Data Augmentation Methods for Brand Logo Classifiers

Keywords

1 Introduction

2 Datasets