Elsevier

Knowledge-Based Systems

Volume 145, 1 April 2018, Pages 1-14
Knowledge-Based Systems

Efficiently mining high utility itemsets with negative unit profits

https://doi.org/10.1016/j.knosys.2017.12.035Get rights and content

Abstract

A High Utility Itemset (HUI) mining is an important problem in the data mining literature that considers utilities of items (such as profits and margins) to discover interesting patterns from transactional databases. Several data structures, pruning strategies and algorithms have been proposed in the literature to efficiently mine high utility itemsets. Most of these works, however, do not consider itemsets with negative unit profits that provide greater flexibility to a decision maker to determine profitable itemsets. This paper aims to advance the state-of-the-art and presents a generalized high utility mining (GHUM) method that considers both positive and negative unit profits. The proposed method uses a simplified utility-list data structure for storing itemset information during the mining process. The paper also introduces a novel utility based anti-monotonic property to improve the performance of HUI mining. Furthermore, GHUM adapts key pruning strategies from the basic HUI mining literature and presents new pruning strategies to significantly improve the performance of mining. The proposed method is evaluated on a set of benchmark sparse and dense datasets and compared against a state-of-the-art method. Rigorous experimental evaluation is performed and implications of the key findings are also presented. In general, GHUM was found to deliver more than an order of magnitude improvement at a fraction of the memory over the state-of-the-art FHN method.

Introduction

High Utility Itemset (HUI) mining problem [1], [2], [3], [4], [5], [6] involves the use of item utilities to discover profitable itemsets from a transactional database. It considers both internal and external utilities of items to discover profitable itemsets from the database. The problem has received significant attention in the recent years due to its potential applicability in numerous business and scientific applications.

The high utility itemset mining problem is a generalization of the frequent itemset mining [7] problem. The frequent itemset mining problem uses the notion of support (or item co-occurrence frequencies) to discover interesting patterns. Numerous algorithms have been proposed in the literature to efficiently mine frequent itemsets. Such algorithms predominantly employ a support based anti-monotonic property to efficiently mine interesting patterns. The utility of an itemset, however, do not satisfy anti-monotonic property [1], [8]. Hence, the HUI mining problem is considerably hard and computationally challenging in nature [1].

The high utility mining methods in the literature can be broadly categorized as level-wise [1], [8], [9], tree-based [2], [10], [11], [12], utility list based [3], [4], [5], [13], hyperlink based [14] and projection based [6] methods. Most of the current works in the literature support only items with positive unit profits. However, in most real-life situations, there is a need to consider items with both positive and negative unit profits or margins. For example, supermarket firms like Walmart often runs hundreds or thousands of cross product promotional campaigns per month. The campaign often involves offering products at everyday low pricing (EDLP), discounted price (that might lead to negative margin) or free products (negative profit) or bundled offerings (mix of discounted and non-discounted products). The additional costs (or losses) incurred on individual items that are part of a promotion are insignificant, if the overall promotional campaign delivers profitable outcomes. In essence, a firm is interested in choosing the bundle of products (or itemsets) that maximize its overall profitability. The state-of-the-art HUI mining methods like HUI-Miner [3], FHM [4], and EFIM [6] cannot be directly used for handling such problems that require consideration of items with both positive and negative unit profits to maximize overall profitability.

A few recent works in the literature [5], [15], [16] have made attempts to address the above problem. HUINIV-Mine [15] and FHN [5], [16] are the two key methods that considers both positive and negative unit profits while mining HUIs. The HUINIV-Mine [15] uses a level-wise candidate generation and test approach. The more recent and the most efficient FHN [5] method uses a utility list based data structure for mining HUIs with negative unit profits. The FHN [5] method is shown to be 2–3 orders of magnitude faster than the HUINIV-Mine [15] method. We argue that the state-of-the-art FHN method uses a relatively complex utility list data structure and do not exploit interesting anti-monotonic properties of items with negative unit profits. This paper introduces a novel anti-monotonic property and suggests a few new pruning strategies to significantly improve the efficiency of mining HUIs with negative unit profits. More specifically, the key novelties and contributions of this paper are as follows:

  • 1.

    Presents a new method, GHUM, for efficiently mining high utility itemsets with negative unit profits. The presented method uses a simplified utility list based data structure for efficiently storing itemset information.

  • 2.

    Introduces a novel anti-monotonic property of itemsets (A-Prune) for mining HUIs with negative unit profits. This property has not been explored, to the best of our knowledge, in the HUI mining literature.

  • 3.

    Several pruning properties have been proposed in the literature for efficiently mining HUIs with positive unit profits. However, most of these pruning properties cannot be directly applied to the new problem that considers both positive and negative unit profits. The proposed method adapts the key pruning properties (namely, U-Prune [3], LA-Prune [13]) for the new HUI mining problem and demonstrates their effectiveness.

  • 4.

    A utility list based HUI mining method require expensive utility list intersections and candidate evaluations during the mining process. We propose a novel pruning strategy (N-Prune) to significantly reduce the total number of evaluations made during the mining process and improve the performance of HUI mining.

  • 5.

    Explores a few novel optimizations to improve the overall efficiency of HUI mining with negative unit profits. More specifically, the paper considers optimizations based on utility list compaction, support based sorting of items with negative unit profits, and dynamic sorting of items with negative unit profits. The experimental results clearly reveal the usefulness of the proposed optimizations.

  • 6.

    Substantial experimental evaluation is performed on a variety of benchmark dense and sparse datasets to demonstrate the utility of the proposed ideas. The proposed GHUM method was found to deliver an order of magnitude improvement, at a fraction of the memory, over the state-of-the-art FHN method.

The rest of the paper is organized as follows. Section 2 describes the related work and highlights some of the key gaps in the existing literature. Section 3 formally introduces the problem, and discusses the key definitions and notations used in the paper. The proposed algorithm and its pruning strategies are outlined in Section 4. Section 5 presents the experimental design and evaluation on several benchmark datasets. A comparative evaluation of GHUM against the state-of-the-art FHN method is also made in this section. Finally, Section 6 provides concluding remarks. The limitations and directions for further research are also presented.

Section snippets

Related literature

In this section, we review the literature on high utility mining with both positive and negative unit profits. Subsequently, we discuss the key differences of the proposed work from existing works in the literature.

Problem statement, definition and notation

We formally define the key terms in utility mining using the standard conventions followed in the literature [1], [3], [8], [10].

Let I={i1,i2im} be a set of distinct items. A set XI is referred as an itemset. A transaction Tj={xl|l=1,2Nj,xlI}, where Nj is the number of items in transaction Tj. A transaction database D has set of transactions, D={T1,T2Tn}, where n is the total number of transactions in the database. A sample transaction database D is given in Table 1.

Definition 1

Each item xiI is

Our proposed method

In this section, we present a simplified utility list data structure, discuss the proposed pruning strategies and outline the key algorithm steps.

Experimental results

We implemented GHUM algorithm by extending the open-source data mining library, SPMF [27]. All our experiments were conducted on a Intel Core i5-2500 machine, 3.3GHz CPU with 4GB of memory, and running on a Windows OS. In order to ensure robustness of the results, we ran all our experiments five times and reported the average results.

Conclusion and future research directions

This paper presented a new utility mining method (GHUM) for efficiently mining high utility itemsets with negative unit profits. The method used a simplified utility-list data structure to store utility information during the mining process. The method adapts existing pruning strategies (generalized U-Prune and LA-Prune) and proposes two new pruning strategies (A-Prune and N-Prune) for efficient utility mining. The proposed method was found to be superior compared to the state-of-the-art FHN

References (27)

  • Y. Liu et al.

    A two-phase algorithm for fast discovery of high utility itemsets

  • V.S. Tseng et al.

    Efficient algorithms for mining high utility itemsets from transactional databases

    IEEE Trans. Knowl. Data Eng.

    (2012)
  • M. Liu et al.

    Mining high utility itemsets without candidate generation

    Proceedings of the 21st ACM International Conference on Information and Knowledge Management

    (2012)
  • Cited by (0)

    View full text