Efficiently mining high utility itemsets with negative unit profits
Introduction
High Utility Itemset (HUI) mining problem [1], [2], [3], [4], [5], [6] involves the use of item utilities to discover profitable itemsets from a transactional database. It considers both internal and external utilities of items to discover profitable itemsets from the database. The problem has received significant attention in the recent years due to its potential applicability in numerous business and scientific applications.
The high utility itemset mining problem is a generalization of the frequent itemset mining [7] problem. The frequent itemset mining problem uses the notion of support (or item co-occurrence frequencies) to discover interesting patterns. Numerous algorithms have been proposed in the literature to efficiently mine frequent itemsets. Such algorithms predominantly employ a support based anti-monotonic property to efficiently mine interesting patterns. The utility of an itemset, however, do not satisfy anti-monotonic property [1], [8]. Hence, the HUI mining problem is considerably hard and computationally challenging in nature [1].
The high utility mining methods in the literature can be broadly categorized as level-wise [1], [8], [9], tree-based [2], [10], [11], [12], utility list based [3], [4], [5], [13], hyperlink based [14] and projection based [6] methods. Most of the current works in the literature support only items with positive unit profits. However, in most real-life situations, there is a need to consider items with both positive and negative unit profits or margins. For example, supermarket firms like Walmart often runs hundreds or thousands of cross product promotional campaigns per month. The campaign often involves offering products at everyday low pricing (EDLP), discounted price (that might lead to negative margin) or free products (negative profit) or bundled offerings (mix of discounted and non-discounted products). The additional costs (or losses) incurred on individual items that are part of a promotion are insignificant, if the overall promotional campaign delivers profitable outcomes. In essence, a firm is interested in choosing the bundle of products (or itemsets) that maximize its overall profitability. The state-of-the-art HUI mining methods like HUI-Miner [3], FHM [4], and EFIM [6] cannot be directly used for handling such problems that require consideration of items with both positive and negative unit profits to maximize overall profitability.
A few recent works in the literature [5], [15], [16] have made attempts to address the above problem. HUINIV-Mine [15] and FHN [5], [16] are the two key methods that considers both positive and negative unit profits while mining HUIs. The HUINIV-Mine [15] uses a level-wise candidate generation and test approach. The more recent and the most efficient FHN [5] method uses a utility list based data structure for mining HUIs with negative unit profits. The FHN [5] method is shown to be 2–3 orders of magnitude faster than the HUINIV-Mine [15] method. We argue that the state-of-the-art FHN method uses a relatively complex utility list data structure and do not exploit interesting anti-monotonic properties of items with negative unit profits. This paper introduces a novel anti-monotonic property and suggests a few new pruning strategies to significantly improve the efficiency of mining HUIs with negative unit profits. More specifically, the key novelties and contributions of this paper are as follows:
- 1.
Presents a new method, GHUM, for efficiently mining high utility itemsets with negative unit profits. The presented method uses a simplified utility list based data structure for efficiently storing itemset information.
- 2.
Introduces a novel anti-monotonic property of itemsets (A-Prune) for mining HUIs with negative unit profits. This property has not been explored, to the best of our knowledge, in the HUI mining literature.
- 3.
Several pruning properties have been proposed in the literature for efficiently mining HUIs with positive unit profits. However, most of these pruning properties cannot be directly applied to the new problem that considers both positive and negative unit profits. The proposed method adapts the key pruning properties (namely, U-Prune [3], LA-Prune [13]) for the new HUI mining problem and demonstrates their effectiveness.
- 4.
A utility list based HUI mining method require expensive utility list intersections and candidate evaluations during the mining process. We propose a novel pruning strategy (N-Prune) to significantly reduce the total number of evaluations made during the mining process and improve the performance of HUI mining.
- 5.
Explores a few novel optimizations to improve the overall efficiency of HUI mining with negative unit profits. More specifically, the paper considers optimizations based on utility list compaction, support based sorting of items with negative unit profits, and dynamic sorting of items with negative unit profits. The experimental results clearly reveal the usefulness of the proposed optimizations.
- 6.
Substantial experimental evaluation is performed on a variety of benchmark dense and sparse datasets to demonstrate the utility of the proposed ideas. The proposed GHUM method was found to deliver an order of magnitude improvement, at a fraction of the memory, over the state-of-the-art FHN method.
The rest of the paper is organized as follows. Section 2 describes the related work and highlights some of the key gaps in the existing literature. Section 3 formally introduces the problem, and discusses the key definitions and notations used in the paper. The proposed algorithm and its pruning strategies are outlined in Section 4. Section 5 presents the experimental design and evaluation on several benchmark datasets. A comparative evaluation of GHUM against the state-of-the-art FHN method is also made in this section. Finally, Section 6 provides concluding remarks. The limitations and directions for further research are also presented.
Section snippets
Related literature
In this section, we review the literature on high utility mining with both positive and negative unit profits. Subsequently, we discuss the key differences of the proposed work from existing works in the literature.
Problem statement, definition and notation
We formally define the key terms in utility mining using the standard conventions followed in the literature [1], [3], [8], [10].
Let be a set of distinct items. A set X⊆I is referred as an itemset. A transaction where Nj is the number of items in transaction Tj. A transaction database D has set of transactions, where n is the total number of transactions in the database. A sample transaction database D is given in Table 1.
Definition 1 Each item xi ∈ I is
Our proposed method
In this section, we present a simplified utility list data structure, discuss the proposed pruning strategies and outline the key algorithm steps.
Experimental results
We implemented GHUM algorithm by extending the open-source data mining library, SPMF [27]. All our experiments were conducted on a Intel Core i5-2500 machine, 3.3GHz CPU with 4GB of memory, and running on a Windows OS. In order to ensure robustness of the results, we ran all our experiments five times and reported the average results.
Conclusion and future research directions
This paper presented a new utility mining method (GHUM) for efficiently mining high utility itemsets with negative unit profits. The method used a simplified utility-list data structure to store utility information during the mining process. The method adapts existing pruning strategies (generalized U-Prune and LA-Prune) and proposes two new pruning strategies (A-Prune and N-Prune) for efficient utility mining. The proposed method was found to be superior compared to the state-of-the-art FHN
References (27)
- et al.
FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits
Knowl. Based Syst.
(2016) - et al.
Mining itemset utilities from transaction databases
Data Knowl. Eng.
(2006) - et al.
Isolated items discarding strategy for discovering high utility itemsets
Data Knowl. Eng.
(2008) Pruning strategies for mining high utility itemsets
Expert Syst. Appl.
(2015)- et al.
An efficient algorithm for mining high utility itemsets with negative item values in large databases
Appl. Math. Comput.
(2009) - et al.
Discovery of high utility itemsets from on-shelf time periods of products
Expert Syst. Appl.
(2011) - et al.
On-shelf utility mining with negative item values
Expert Syst. Appl.
(2014) - et al.
An efficient algorithm for mining the top-k high utility itemsets using novel threshold raising and pruning strategies
Knowl. Based Syst.
(2016) - et al.
Efficient algorithms for mining high-utility itemsets in uncertain databases
Knowl. Based Syst.
(2016) - et al.
Mining of high average-utility itemsets using novel list structure and pruning strategy
Future Gener. Comput. Syst.
(2017)