Revenue prediction by mining frequent itemsets with customer analysis

https://doi.org/10.1016/j.engappai.2017.04.020Get rights and content

Abstract

Conventional frequent itemsets mining does not take into consideration the relative benefit or significance of transactions belonging to different customers. Therefore, frequent itemsets with high revenues cannot be discovered through the conventional approach. In this study, we extended the conventional association rule problem by associating the frequency–monetary (FM) weight with a transaction to reflect the interest or intensity of customer values and focusing on revenue. Furthermore, we proposed a new algorithm for discovering frequent itemsets with high revenues from FM-weighted transactions with customer analysis. The experimental results from the survey data revealed that the top k frequent itemsets with high revenues discovered using the proposed approach outperformed those discovered using the conventional approach in the prediction of revenues from customers in next-period transactions.

Introduction

In the knowledge discovery in data domain, association rule mining (ARM) is an important data mining approach that can enable the discovery of consumer purchasing behaviors from transaction databases. Agrawal et al. (1993) first introduced the problem of ARM defining it as identifying all rules from the transaction data that satisfy the minimum support and confidence constraints. The discovery of interesting associations or correlations is helpful in many business decision-making processes (Han and Kamber, 2006).

However, general ARM does not take into consideration the relative benefit or significance of transactions belonging to different customers, and instead assumes that the importance of each customer is identical. In other words, every customer is of equal weight during the mining process. However, numerous studies in customer relationship management (CRM) have revealed that the contributions of customers to businesses and profit maximization differ. Therefore, the evaluation of customer value is necessary before designing effective marketing strategies.

Businesses have started applying data mining technologies to marketing planning. Their objective is to gain customer loyalty and discover the contribution of customer value. Recency–frequency–monetary (RFM) analysis depends on recency (R), frequency (F), and monetary (M) measures and is one of the most popular database marketing metrics for quantifying customer transaction histories. RFM scoring is a method for determining the score of current customers on the basis of their R, F, and M values, and has been proven to be highly effective in marketing database applications (Blattberg et al., 2008). Moreover, RFM analysis is a well-known, behavior-based data mining method, which extracts customer profiles by using specific criteria. Recently, the RFM model has been used for CRM applications such as customer segmentation (Shim et al., 2012, Dursun and Caber, 2016).

Because RFM analysis and market basket analysis (i.e., frequent pattern mining) are the two most important tasks in database marketing, this study extended the conventional association rule problem by associating a customer value (i.e., frequency–monetary (FM) weight, which is determined by applying the FM scoring method) with a transaction to reflect the interest or intensity of customer values. This facilitates the association of an FM weight parameter with each transaction, enabling the discovery of valuable patterns. In addition, we propose a new frequent itemsets frequency–monetary (FIFM)-weighted algorithm for identifying frequent itemsets from FM-weighted transactions for the prediction of customer revenue.

We addressed the following questions related to discovering frequent itemsets from FM-weighted transactions: (1) Do the top k frequent itemsets discovered using the proposed FIFM algorithm outperform those discovered using the conventional Apriori algorithm in terms of predicting customers’ purchasing itemsets? (2) Do the top k frequent itemsets discovered using the proposed FIFM algorithm outperform those discovered using the conventional Apriori algorithm in predicting customer revenue?

The remainder of this paper is organized as follows. A review of related work is presented in Section 2. The problem definitions are provided in Section 3. The proposed algorithm and an example are illustrated in Section 4. Section 5 uses survey data to demonstrate the usefulness of the proposed algorithm. Conclusions and future work are discussed in Section 6.

Section snippets

Related work

The main purpose of this study was to discover frequent itemsets from transaction data with customer values (FM weights). In this section, we mainly explore the problems and some techniques related to association rules and customer value (RFM value). Finally, we discuss the differences between the applications of the present study and a 2014 study by Hu and Yeh.

Problem definitions

In this section, we define the problem of the method for discovering frequent itemsets from FM-weighted transactions. Let I = {it1, it2, …, itm} be a set of itemsets. Let D be a set of database transactions in which each transaction T is a set of items such that TI. A transaction T is considered to contain X if and only if XT.

It is important to determine weight values of the transactions of customers before discovering frequent itemsets from FM-weighted transactions. For preventing attributes

Algorithm for mining frequent itemsets from FM-weighted transactions

We now explain the proposed approach (FIFM) and provide an example to illustrate the method for discovering the frequent (high-FMsup) itemsets from FM-weighted transactions.

Experimental results

We applied three data sets (Foodmart, Grocery-POS, and Supermarket datasets) to evaluate the performance of the proposed FIFM algorithm. The three datasets possess dissimilar sale properties (such as products, customers, and purchasing transactions).

Conclusion

Because of their practicality, ARM algorithms have been used in various applications and data sets. This study is the first to introduce the most efficient method for discovering frequent itemsets from transactions by considering the customer's FM value. Furthermore, we proposed a new algorithm, FIFM, to discover frequent itemsets from FM-weighted transactions. Experimental results from the survey data reveal that the proposed approach can enable the discovery of interesting and valuable

Acknowledgements

This research was supported by the Ministry of Science and Technology of the Republic of China under contract MOST 105-2410-H-166-002.

References (29)

Cited by (15)

  • A case study of batch and incremental recommender systems in supermarket data under concept drifts and cold start

    2021, Expert Systems with Applications
    Citation Excerpt :

    On the item information level, the SMDI dataset includes section and brand identifiers (section_id and brand_id), reference price (ref_price), average, minimum and maximum prices during the period (avg_price, min_price, and max_price), and amount. Since users’ personal information is not made available and only purchase data was collected, we calculate, based on the obtained data, the Recency-Frequency-Monetary (RFM) score (Weng, 2017). Recency (R) is the interval between the current and previous purchase, and thus, the shorter the interval is, the bigger R is.

  • Market Analysis System Based on Apriori Algorithm

    2023, ACM International Conference Proceeding Series
  • The RFM model analysis for VIP customer: A case study of golf clothing brand

    2022, International Journal of Knowledge Management
  • Employing Video-based Motion Data with Emotion Expression for Retail Product Recognition

    2021, International Journal of Advanced Computer Science and Applications
View all citing articles on Scopus
View full text