Abstract
With the rapid growth of e-commerce, a large number of online transactions are processed every day. In this paper, we take the initiative to conduct a systematic study of the challenging prediction problems of sales bursts. Here, we propose a novel model to detect bursts, find the bursty features, namely the start time of the burst, the peak value of the burst and the off-burst value, and predict the entire burst shape. Our model analyzes the features of similar sales bursts in the same category, and applies them to generate the prediction. We argue that the framework is capable of capturing the seasonal and categorical features of sales burst. Based on the real data from JD.com, we conduct extensive experiments and discover that the proposed model makes a relative MSE improvement of 71% and 30% over LSTM and ARMA.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
E-commerce websites have become an ubiquitous mechanism for online shopping. Devendra first defined that electronic commerce, commonly known as e-commerce or eCommerce, consists of the buying and selling of products or services over electronic system such as internet and other computer network [1]. The effects of e-commerce have reached all areas of business, from customer service to new product design [2].
According to statistics in 2015, turnover of E-commerce has reached 18.3 trillion in China, up by 36.5% [4]. This can also be seen from the huge trading volume on some special shopping festivals in China. In the shopping festival of 11th Nov, 2016, the single-day merchandise trade of tmall.com reached 912.17 billion yuan, up by 61%. For the logistics industry, the number of orders will have a corresponding explosive growth. Nowadays, the e-commerce platforms usually launch promotional activities on a particular category of products. Hence the products in the same category always show similar sales changes.
In e-commerce field, time series prediction is wildly used. A large number of methods have been developed and applied to time series forecasting problems, such as sophisticated statistical methods [16] and neural network [15]. For any product, the sales series can increase or fall sharply, which we called spikes or bursts. The prediction of bursts is beneficial in several aspects, such as the storage optimization of the suppliers and the stability maintanance of the e-commerce website. In the existing studies about burst prediction, most of them are concerned about predicting the bursty features. Few of them focus on the prediction of the entire sales burst shape.
In this paper, we study the task of analyzing the bursty features of product sales, and propose a model framework to predict sales burst. The major contributions of this paper include:
-
1.
We define a new problem of time series prediction that mainly concerns about the bursts. To formulate this problem, we split it into three parts: detecting bursts, predicting the features and predicting the shape of bursts.
-
2.
We propose a novel framework to capture the seasonal and categorical features for predicting the entire burst series. We also take the initiative to use the reshaped nearest neighbors to simulate the burst shape.
-
3.
We conduct extensive experiments on real datasets from sale records on JD.com and evaluate the advantages and characteristics of the proposed model. The results show that our model has a significant improvement.
The rest of the paper is organized as follows. In Sect. 2 we present the background and relevent literature of the problem studied. In Sect. 3 we give the problem formulation and notations. Datasets with the preprocessing steps are shown in Sect. 4. In Sect. 5 we give the intuition of our model and describe the layers of our model framework. Experimental results on real life data are shown in Sect. 6. Finally, we conclude our work in Sect. 7.
2 Related Work
In recent years, many methods have been developed and applied to time series forecasting problems, such as sophisticated statistical methods [16] and neural networks [15]. Schaidnagel et al. [14] presented a parametrized time series algorithm that predicts sales volumes with variable product prices and low data support.
Burst, defined as “a brief period of intensive activity followed by long period of nothingness” [3], is a common phenomenon in time series. Most existing studies about burst prediction, mainly applied in social networks. Lin et al. [11] proposed a framework to capture dynamics of hashtags based on their topicality, interactivity, diversity, and prominence. Ma et al. [12] predicted the popularity of hashtags on daily basis. [8, 9] introduced a method to predict the burst of Twitter hashtags before they actually burst. Bursts of topics, sentiments and questions have been demonstrated to have a predictive power of product sales. Gruhl et al. [6] proposed to use online postings to predict spikes in sales rank. Such methods have been proved to be effective and achieved good performance. However, how to design a framework that can predict the entire burst shape, is yet to be answered. Inspired by the success of these methods, we divided the problem into several parts and investigate using a framework with three layers.
3 Problem Formulation and Notations
The sales of a product can be formed a time series \({<}x_1,x_2,...x_t,...{>}\). \(x_t\) donates the count of sales at the t-th time interval. Features of a sales burst contains the start time, start value, burst time, peak value, period, off-burst time, etc. Given a product p in category c, we consider a prediction scenario: a product p has a historical sales series x, at time t, will the sales of product burst in the near future? If the product will be a bursting one, what’s the shape of the entire burst?
In the rest of the paper, the technical details of the algorithms will be described mostly in vector forms. All the assumptions and notations mentioned in this paper are listed in Table 1.
4 Dataset and Set up
4.1 Dataset Description
To support the comprehensive experiments, we collected product information and transaction records from a real-life e-commerce website, JD.com. The original dataset contains two parts. The first part is a 139 million set of transaction-records from 1/1/2008 to 12/1/2013, including user ID, product ID, and the purchasing date. Single-day sale of one product ranges from 0 to 7802. The second part is a 0.25 million set of product information, including product ID, categories, brand and product name. There are 167 categories in total.
4.2 Preprocess
In order to prepare the datasets required in the framework, we conducted the following steps.
Filtering. Since random factors have a great impact on time series when the whole number of records is small, we select records from 2010 to 2013, which constitute 90% of all sales. Besides, we choose products with good selling frequency, using a threshold of 100 records in total.
Time Series Generation. For each product, We calculate the daily sales volume as the value of each node in the time series with the transaction records.
Smoothing. After the above process, we will get plenty of raw time series with a lot of sharp rise and fall. To detect bursts, we smooth the time series. In specific, we perform discrete kalman filter [7] with exponentially weighted moving average (EWMA) on each time series. The smoothed series in each category will form a new time series set.
Sample Bursts Extraction. For category c, we have time series set \(X_c\). Suppose \(X \in X_c, X = {x_1,x_2,...x_d}\), the sample bursts are extracted through the following steps. First, we find all the local maximums lmaxs and local minimums lmins for X. For the sequence between each local minimal pair, \((lmin_i, lmin_{i+1})\), there must be a \(lmax_i\). So we judge whether the sequence contains a burst by the slope of \((lmin_i, lmax_i)\). The threshold delta is computed by the standard deviation of X and window size k.
In the process of rise and fall, sales series may have small fluctuations. Once a new burst b is detected, we will judge whether it should be merged with the previous burst \(b^{'}\) with the features of b and \(b^{'}\). In specific, there are two situations when we choose to merge two bursts:
-
Burst \(b^{'}\) appeared in the falling process of burst b: merge b and \(b^{'}\) as a new burst \(b^{''} = (s,p,b,T+T^{'},e^{'})\)
-
Burst b appeared in the rising process of burst \(b^{'}\): merge b and \(b^{'}\) as a new burst \(b^{''} = (s,p^{'},b^{'},T+T^{'},e^{'})\)
Reshaping. Based on the assumption that products in the same category have similar bursty features, we propose to generate a new type of dataset for each category with the burst series. The burst series with period T in each category are reshaped into a fixed length m:
All reshaped bursts will form a new dataset \(\bar{X}_c\).
Hence, for each category c, we have three datasets: the smoothed time series dataset \(X_c\), the reshaped bursts dataset \(\bar{X}_c\), and the bursty feature dataset \(B_c\). The categories with less than N products (we set N as 100) are ignored in this process, and we keep 64 categories in total. Figure 1 shows the distribution of the bursty feature datasets.
5 The Model
5.1 Intuition
In the existing studies about burst prediction, most of them focus on how to predict the bursty features, rather than the entire burst. The performance of models are improved by the usage of various features. For sales time series, the extra information is much less than topics or hashtags on websites. Most of the time, we just have the transaction records and basic information about the products. When we are faced with a new product with no historical data, the prediction problem will become more difficult. In the absence of sufficient information and data, we need to figure out a proper approach to predict the entire burst shape. Next we will present an overview of the model framework.
5.2 Model Overview
We propose a model framework to predict the entire burst shape of online products, shown in Fig. 2. The model has three layers, namely the burst detection layer, bursty feature prediction layer, and the burst shape prediction layer. The final layer will generate the output prediction of the whole model. The model implements the following work flow:
5.3 Burst Detection
The burst detection layer aims to detect the start of a burst. For time series X, we let the window w slide with the timeline from the start point of X. At time t, we get a sequence \(s_t = {x_t, x_{t+1},..., x_{t+d}}\), and we will use this sequence as the input of the classification model. The sliding window is to simulate and solve the real situation in e-commercial websites.
For the classification part, we apply a SVM-based method to solve this problem. We first define a feature space based on the time series features in [10], and add some new features such as \(max_value\), \(min_value\), \(id_max\) and \(id_min\). The feature vector f is the input of the SVM model, and we will get the class label as output, indicating whether it will be a bursting one.
5.4 Bursty Feature Prediction
Once a burst has been detected, we want to know its bursty features, such as how high the peak will reach, when it will be off-burst and what the off-burst value is. In this layer, these features are predicted by three different regression model, named \(M^{(1)}, M^{(2)}\) and \(M^{(3)}\). The predicted results in \(M^{(i-1)}\) and f will form a new vector, as the input for \(M^{(i)}\). For each category c, We will train several different types of \(M^{(i)}\), and choose the one with smallest mean square error (MSE).
5.5 Burst Shape Prediction
The burst shape prediction layer aims to forecast the whole burst shape using the bursty features b and the time sequence \(s_t\). For each category c, a clustering model is pre-trained with the reshaped bursts dataset \(\bar{X}_c\).
The key idea is to use the corresponding part and the bursty features as the measure of similarities. First, with the predicted period T, the time sequence \(s_t\) will be reshaped into a new sequence of fixed length \(l = k\times (d/m)\), \(s_t^{'} = {s_1, s_{2},...,s_{l}}\). For each cluster center of reshaped bursts \(\bar{x} = {\bar{x}_1,\bar{x}_2,...,\bar{x}_m}\), we only use the same part \({\bar{x}_1,\bar{x}_2,...,\bar{x}_l}\) to calculate the similarity of two sequences and find the predicted cluster \(s_t\). There are plenty of methods to measure similarity, such as Euclidean distance, cosine similarity and so on. For time series, Dynamic Time Warping (DTW) [13] is good alternative, which is designed to find an optimal alignment between two given (time-dependent) sequences. Then, we can find all reshaped bursts in \(s_t\), and calculate new similarities between \(s_t^{'}\) and their corresponding parts. Here, the new similarity contains sequence similarity, absolute value of period similarity, peak similarity and off-burst value similarity. The weight of the first part is highest and for other parts we assign same weights. We choose the highest ranking k sequences, calculate its mean series \(\bar{x}^{'}\), and rescale it with the period T as the output prediction. The rescaling method uses the reverse process of reshaping: \(X = {x_1,x_2,...,x_T}, x_i = \bar{x}^{'}_k + (\bar{x}^{'}_{k+1} - \bar{x}^{'}_k) \times (i\times \frac{m}{T} - k), k = \lfloor i\times \frac{m}{T}\rfloor \).
6 Experiments and Evaluation
Setup. For each individual dataset, we randomly divide the samples into three folds: the training set, the validation set and the test set, with the proportion of 3:1:1. The model is trained using the training set, and is then tested on the validation set. Such cross-validation is performed on the same individual dataset for ten times with random splits, and the reported performance is the averaged value cross the ten iterations. Finally we test the model on the test set and report the performance.
6.1 Evaluation of the Burst Detection Layer
In this layer, we set a maximum number of samples as 30000 to reduce training time, and randomly select the same number of positive and negative samples from the training dataset. Before the training, we evaluate the contributions of features with a L1-based linear SVC model, and then reduce the dimensionality of input feature vector of the classifier, a SVC model with rbf kernel. Precision, Recall and F1-score are reasonable metrics for the evaluation of this layer.
Table 2 and Fig. 3 show the performance comparison of the first layer, detecting bursts. It can be observed that our model shows the best performance of F1-score for 90% categories listed in the table, and achieves the highest F1-score of 0.77 on average. Besides, our model achieves relatively high precision and recall scores on average, both scoring over 0.72. The performance of K-Nearest Neighbors is quite different, which achieves the best recall score 0.84 and the worst precision 0.63. SVM models with sigmoid kernel is the most ineffective and unstable one, with the deviations of all scores over 0.15. SVM models with linear kernel and rbf kernel always have similar performance.
In practical applications, there may be different requirements on precision and recall. These requirements can be satisfied by training optimal prediction models using different types of F-scores, and select the one with best score.
6.2 Evaluation of the Bursty Features Prediction Layer
For each category c, we train multiple regression models, select the best one, that is, the one with smallest MSE, as prediction model for current c. Using features we generated, we can find typical features for each c. In specific, we use the feature selection algorithm \(\chi ^2\), to compute the score of each feature, and select the \(K=10\) highest scoring features.
Table 3 studies the performance of the proposed method and the competitive methods on 64 individual datasets of different category, evaluated by average HitRate (HR)@20% and 50%, Mean Squared Error (MSE) and Mean absolute Relative Error (MRE). HitRate is the percentage of times when the relative error is within a specific range. For example, for a set of targets that equal 10, if 50% of time the predicted value is within [6–14], the HitRate@40% is 0.5.
The highlighted numbers in red, blue and black indicate the winners on each model under the corresponding metric. To compare the methods quantitatively, we also provide Fig. 4 (MRE/MSE is normalized with the maximum MRE/MSE among the methods in each entry).
Our model achieves the best performance of MSE and MRE and the highest score of HR@20% and HR@50% with a relatively low deviation. For example, in the task of the peak value prediction, we find our model’s performance and the average of other methods’ performance under MSE, MRE, HR@20% and HR@50% are 25.66 vs. 32.16 (unnormalized), 1.87 vs. 2.15 (unnormalized), 0.16 vs. 0.13 and 0.40 vs. 0.33 respectively, showing that our model makes a relative improvement of 20%, 13%, 23% and 21% respectively. In the evaluation of forecasting the burst period and off-burst value, our model has a optimal performance in HR@50%, reaching 0.68 and 0.38 on average.
It can also be observed that Linear Regression and Bayes Ridge Regression show similar performance on HR@20% and HR@50%. Linear SVR and SVR with rbf kernel have different performance under MSE and MRE. The CART method have the worst performance on MSE and MRE among all the methods.
6.3 Evaluation of the Burst Shape Prediction Layer
For this prediction task, we compare our model with two different models commonly used for time series prediction. The metric mean square error (MSE) is used to evaluate the prediction performance. We perform the Affinity Propagation Clustering [5] on the dataset of reshaped bursts. Euclidean distance is applied to calculate the input similarity matrix of the training set, and we choose DTW distance as the measure of similarity when forecasting. Besides, when the start value of the forecasted result and the last value in the window differ greatly, we apply a decay function to smooth the 30% of the predicted burst series, namely \(y_{30\%}\): \(y = \alpha f + (1-\alpha )g\). Here, we set f as the straight line from the last point in the window w to the last point of \(y_{30\%}\), and set g as \(y_{30\%}\). \(\alpha \) is set as the exponential function \(e^{(-1/3)t}\).
Table 4 and Fig. 5 shows the results of performance comparison. For the first two methods, we judge the performance on the real period of bursts. If the predicted period of our model is less than the real one, we set the left part with the average value of our prediction. We also draw some of the predicted results into Fig. 6. Conclusions can be drawn as follows:
-
The LSTM model shows the worst performance on average MSE, scoring 20.58. Besides, the performance of LSTM are not that stable, with relatively high deviations of 37.09.
-
Take all categories into account, our model wins 57 out of 64 times on the score of MSE. The average MSE of our method is 5.88, which significantly outperforms the other related methods, showing a relative improvement of 71% and 30%. The MSE standard deviations of our model and other methods are 37.10, 10.36 and 7.57 on average, indicating that our model makes a significant improvement by 80% and 29%.
7 Conclusion
In this paper, we take the initiative to propose a burst prediction framework of online product sales. The framework includes three layers: a burst detection layer, a bursty feature prediction layer and a burst shape prediction layer. The burst detection layer detect the start of a burst with a sliding window and a optimized classification model. The three bursty features, the burst peak, period and off-burst value, are predicted by different regression model with the best training score. The entire burst shape is generated by the burst series with similar seasonal features in the same category. Extensive experiments are conducted on real datasets from JD.com. We find that in average our framework achieves 4% to 45% advantage of F1-score in the classification and up to 73% improvement of HitRate@50% on the feature prediction against other methods. The result shows that the proposed solutions are effective to the burst prediction task, with an average improvement of 71% and 30% on MSE. We expect our framework to be of great value in e-commerce field.
References
Agrawal, D., Agrawal, R.P., Singh, J.B., Tripathi, S.P.: E-commerce: true indian picture. J. Adv. IT 3(4), 250–257 (2012)
Avery, S.: Online tool removes costs from process. Purchasing 123(6), 79–81 (1997)
Barabási, A.-L.: Bursts: The Hidden Patterns Behind Everything we do, from Your E-mail to Bloody Crusades. Penguin, New York (2010)
Cao, L., Zhang, Z.: The 2015 Annual China Electronic Commerce Market Data Monitoring Report. China Electronic Commerce Research Center, Hangzhou (2016)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 78–87. ACM (2005)
Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng. 82(Series D), 35–45 (1960)
Kong, S., Mei, Q., Feng, L., Ye, F., Zhao, Z.: Predicting bursts and popularity of hashtags in real-time. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 927–930. ACM (2014)
Kong, S., Mei, Q., Feng, L., Zhao, Z.: Real-time predicting bursting hashtags on Twitter. In: Li, F., Li, G., Hwang, S., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 268–271. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08010-9_29
Kong, S., Mei, Q., Feng, L., Zhao, Z., Ye, F.: On the real-time prediction problems of bursting hashtags in twitter. arXiv preprint arXiv:1401.2018 (2014)
Lin, Y.-R., Margolin, D., Keegan, B., Baronchelli, A., Lazer, D.: # bigbirds never die: understanding social dynamics of emergent hashtag. arXiv preprint arXiv:1303.7144 (2013)
Ma, Z., Sun, A., Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter. J. Am. Soc. Inf. Sci. Technol. 64(7), 1399–1410 (2013)
Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 11–22. SIAM (2004)
Schaidnagel, M., Abele, C., Laux, F., Petrov, I.: Sales prediction with parametrized time series analysis. In: Proceedings DBKDA, pp. 166–173 (2013)
Thiesing, F.M., Vornberger, O.: Sales forecasting using neural networks. In: 1997 International Conference on Neural Networks, vol. 4, pp. 2125–2128. IEEE (1997)
Weigend, A.S.: Time Series Prediction: Forecasting the Future and Understanding the Past. Santa Fe Institute Studies in the Sciences of Complexity (1994)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61602487), the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (No. 2015030275).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, R., Liu, J. (2017). A Novel Framework for Online Sales Burst Prediction. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-71273-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)