Time is money: Dynamic-model-based time series data-mining for correlation analysis of commodity sales

https://doi.org/10.1016/j.cam.2019.112659Get rights and content

Highlights

  • Right commodities can be recommended to right customers at right time by the proposed method.

  • Dynamic model reflects the relationships of the goods at the different observation time points.

  • The beginning time and duration of correlation among goods can improve the sales quantity.

Abstract

The correlation analysis of commodity sales is very important in cross-marketing. A means of undertaking dynamic-model-based time series data-mining was proposed to analyze the sales correlations among different commodities. A dynamic model comprises some distance models in different observation windows for a time series database that is transformed from a commodities transaction database. There are sales correlations in two time series at different times, and this may produce valuable rules and knowledge for those who wish to practice cross-marketing and earn greater profits. It means that observation time points denoting the time at which the sales correlation occurs constitute important information. The dynamic model that leverages the techniques inherent in time series data-mining can uncover the kinds of commodities that have similar sales trends and how those sales trends change within a particular time period, which indicates that the “right” commodities can be commended to the “right” customers at the “right” time. Moreover, some of the time periods used to pinpoint similar sales patterns can be used to retrieve much more valuable information, which can in turn be used to increase the sales of the correlated commodities and improve market share and profits. Analysis results of retail commodities datasets indicate that the proposed method takes into consideration the time factor, and can uncover interesting sales patterns by which to improve cross-marketing quality. Moreover, the algorithm can be regarded as an intelligent component of the recommendation and marketing systems so that human–computer interaction system can make intelligent decision.

Introduction

Market basket analysis is one of the most popular areas in the field of data-mining ([1], [2], [3], [4]), which is used to analyze customer behaviors with respect to the purchase of commodities. One of the most well-known cases is the story of beers and diapers, in which a correlation was found between these commodities purchased by Walmart customers; it is also a well-known case in cross-marketing. The case tells us that data-mining techniques can uncover interesting patterns, rules, and knowledge from a large database ([5], [6], [7], [8]) and utilize them to guide marketing behaviors so as to further improve commodity sales. This process is a part of intelligent system. Especially, the future of retailing (()) includes five key topic areas, in which “big data” collection and usage as a core intelligent subsystem constitute a popular research direction that is attracting much attention.

In cross-marketing ([6], [10], [11], [12]), one needs to determine which items are bought together and the probability that a customer will buy them. In the era of big data, one of the most commonly used technologies is association rule-mining (ARM) ([13], [14], [15], [16]), which can be used to draw interesting patterns about commodity correlation from transaction databases. The first kind of ARM method, the Apriori algorithm ([17], [18]), can determine the most frequently occurring itemsets in a transaction database. The minimal support degree is given by the user, the support degree of a candidate itemset is greater than the minimum support degree, and the candidate itemset is considered a frequent itemset. Another parameter is the minimal confidence degree, and rules larger than this value are considered strong association rules. The confidence degree can be considered the success rate of selling B given A, where {A,B} is the frequent itemset. If the confidence degree of the frequent itemset {A,B} is larger than the minimal confidence degree, A and B can be seen as correlation commodities. ARM is an important technique by which to recommend the right products to the right customers, in what is often referred to as a “cross-selling recommendation”. It is widely applied to many practical applications (()), such as commodities promotion, banking cross-marketing, medical consumption, and web analysis. Besides ARM, other methods in the data-mining field—including classification (()), clustering (()), and prediction (())—are also used to pinpoint similar items and obtain related rules.

Existing ARM techniques used to uncover correlation commodities do have some disadvantages. (1) The classic Apriori method incurs large time costs when determining frequent itemsets in the transaction database. Especially, multiple input–output accesses create a heavy burden when executing the Apriori algorithm, and this creates problems when data-mining in a large transaction database. Although some versions ([18], [20], [23]) of the Apriori algorithm and its variants have been improved so as to accelerate the calculation of frequent itemsets, some extra conditions – such as the provision of considerably more space storage – will be required in the processes inherent in the related algorithm. (2) It is difficult to provide two important parameters—namely, the minimal support degree and the minimal confidence degree. Overly large parameter values lead to the production of zero rules; they can also increase both the number of candidate sets and the computational complexity. For these reasons, the two parameters are sensitive to the discovery of valuable rules, and it is difficult to set reasonable values in practice. (3) Time is a very important consideration, and existing methods executed with a transaction database usually ignore the time (or time period) in which a customer purchases a commodity. Moreover, the use of excessive time intervals in a transaction database can produce a large number of rules, while too-small intervals may produce zero rules. Therefore, determining the time intervals to be used in a transaction database is a challenge in itself. (4) The traditional method is used to analyze the purchasing behaviors of most customers, and this reflects the relevance of goods. However, the purchasing behavior of customers tends to be unstable, because with time, the purchase demand and behavior of even the same customer will change. In this case, the rules and knowledge derived through ARM may not be valuable in improving product sales and commodity-based profits.

As a technique of recommendation systems, ARM is a means of determining the correlation products in a transaction database. However, if they are to have any value, researchers should look into some ways of reducing their aforementioned deficiencies. The perfect scenario in the cross-selling market is that the right products are sold to the right customers at the right time. In particular, the “right time” is very important, as it determines the time at which the commodity sales and the purchase behaviors of customers will happen concurrently. Some customers do indeed require the products that are recommended, but the time at which the recommendation is made may be the key factor in determining selling success. At the time of recommendation, while some customers may be in great need of the goods being sold, the relevance of the goods may fail at recommendation for any number of reasons (e.g., store congestion, lack of cash among consumers). Therefore, to some extent, determining the “right time” can increase sales and related profits.

Time series data-mining involves a group of intelligent techniques by which to “mine” valuable information and knowledge from time series datasets ([24], [25], [26], [27], [28], [29], [30], [31]). Popular tasks includes time series clustering, time series classification, frequent sequence pattern recognition, abnormal detection, and time series prediction. These tasks are applied in many fields, such as the stock market, word recognition, and financial prediction, inter alia. To the best of our knowledge, in basket market analysis, there is a dearth of good-quality research that focuses on the discovery of association rules and correlation commodities while using time series data-mining techniques. Although several studies ([8], [21]) use traditional clustering to find similar time series in a transaction database, they do not resolve which items are relevant and at what time, so that they can be sold together at a particular time.

The motivations for our work are threefold: (1) we wish to determine how to transform the transaction records into sales time series that are suitable for determining correlation commodities via data-mining; (2) we look to uncover, in the absence of customer information, how to use time series data-mining to find related goods, and then enhance the sale of goods. As customers’ privacy concerns increase ([32], [33]), it will become increasingly important to directly enhance market sales without compromising customer information; and (3) we look to pinpoint the optimal time for product promotion and cross-selling. According to analyses of historical commodity purchasing data, knowing the time of correlation of commodity sales is beneficial to selling the next commodity at the same time point. In other words, we can determine the right time – as per the historical commodity purchasing action – to improve store layout and design.

In the current study, we propose the combination of a dynamic model with time series data-mining, to promote the correlation analysis of commodity sales. The dynamic model comprises some distance models that are created via a distance function. (The distance function measures the similarity between any two sequences that respectively come from two original time series.) According to the length of the time period or the observation time window, the original time series can be segmented into various subsequences. In this way, different lengths of observation time windows create different distance models, which in turn combine to form a dynamic model. In addition, time series clustering based on affinity propagation (AP) and K-nearest neighbors (KNNs) classification (i.e., similarity searches) are combined with the dynamic model to complete the work. As such, the current study contributes to the literature by offering a means of retail sales analysis that is based on the dynamic model.

Section snippets

Some related techniques

First, before reviewing the proposed novel method, let us review some related time series data-mining techniques—namely, time series similarity measuring, AP clustering, and the search for KNN similarity.

Model construction

A dynamic model is a key component used to conduct correlation analysis with respect to commodity sales, using time series data-mining. It comprises distance models produced in different observation time windows for commodity sales time series that are transformed from a transaction database. As Fig. 3 shows, the commodity sales correlation analysis process involves four steps. First, data transformation converts transaction record data into commodity sales time series data that take into

Commodity sales analysis based on the dynamic model

A dynamic model, as mentioned, comprises a number of distance models, and it can be applied to the field of time series data-mining to undertake a correlation analysis of commodity sales. In this section, we combine the clustering algorithm based on AP and the KNNs with a dynamic model to analyze the sales correlation among different commodities. In addition, two retail datasets are leveraged to illustrate the application of the dynamic model to a correlation analysis of commodity sales.

Conclusions

In the current study, we propose a dynamic model that uses time series data-mining to uncover correlations in commodity sales. In view of the disadvantages of traditional methods like association rule mining (ARM), the dynamic model – comprised as it is of some distance models – is applied in the course of analyzing correlations in commodity sales, in different observation time windows and at various observation time points. (The observation time window indicates for how long the correlation of

Acknowledgments

This work was supported by the National Natural Science Foundation of China [grant numbers 71771094, 61300139]; Project of Science and Technology Plan of Fujian Province of China [grant number 2019J01067]; Promotion Program for Young, Middle-aged Teacher in Science and Technology Research of Huaqiao University, China [grant number ZQN-PY220], and Ministry of Science & Technology, Taiwan [grant number MOST 108-2511-H-003 -034 -MY2].

References (51)

  • SunL. et al.

    Unsupervised EEG feature extraction based on echo state network

    Inform. Sci.

    (2019)
  • MajumdarK. et al.

    A geometric analysis of time series leading to information encoding and a new entropy measure

    J. Comput. Appl. Math.

    (2018)
  • SunY. et al.

    An improvement of symbolic aggregate approximation distance measure for time series

    Neurocomputing

    (2014)
  • ZhangZ. et al.

    Dynamic time warping under limited warping path length

    Inform. Sci.

    (2017)
  • WanY. et al.

    Adaptive cost dynamic time warping distance in time series analysis for classification

    J. Comput. Appl. Math.

    (2017)
  • WangL.C. et al.

    Data-driven resource management for ultra-dense small cells: an affinity propagation clustering approach

    IEEE Trans. Netw. Sci. Eng.

    (2018)
  • WuX. et al.

    Data mining with big data

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • AguinisH. et al.

    Using market basket analysis in management research

    J. Manag.

    (2013)
  • SaginA.N. et al.

    Determination of association rules with market basket analysis: application in the retail sector

    Southeast Eur. J. Soft Comput.

    (2018)
  • SahooJ. et al.

    An effective association rule mining scheme using a new generic basis

    Knowl. Inf. Syst.

    (2015)
  • TanS.C. et al.

    Time series clustering: a superior alternative for market basket analysis

  • S. Bell, R. Fernandes, V. D’agostino, Method and system for cross-marketing products and services over a distributed...
  • SalemM.Z.

    Effects of perfume packaging on basque female consumers purchase decision in spain

    Manage. Decis.

    (2018)
  • GuptaS. et al.

    A survey on association rule mining in market basket analysis

    Int. J. Inf. Comput. Technol.

    (2014)
  • AbdulsalamS. et al.

    Data mining in market basket transaction: an association rule mining approach

    Int. J. Appl. Inf. Syst.

    (2014)
  • Cited by (0)

    View full text