FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

doi:10.1016/j.knosys.2016.08.022

Knowledge-Based Systems

Volume 111, 1 November 2016, Pages 283-298

https://doi.org/10.1016/j.knosys.2016.08.022 Get rights and content

Abstract

High utility itemset mining is an emerging data mining task, which consists of discovering highly profitable itemsets (called high utility itemsets) in very large transactional databases. Many algorithms have been proposed to efficiently discover high utility itemsets but most of them assume that items may only have positive unit profits. However, in real-world transactional databases, items (products) often have positive or negative unit profits. Mining high utility itemsets in a transactional database where items have positive or negative unit profits is a computationally expensive task, and it is thus desirable to design more efficient algorithms. To address this issue, we propose an efficient algorithm named FHN (Faster High-Utility itemset miner with Negative unit profits). It relies on a novel PNU-list structure (Positive-and-Negative Utility-list) structure to efficiently mine high utility itemsets, while considering both positive and negative unit profits. Moreover, several pruning strategies are introduced in FHN to reduce the number of candidate itemsets, and thus enhance the performance of FHN. Extensive experimental results on both real-life and synthetic datasets show that the proposed FHN algorithm is in general two to three orders of magnitude faster and can use up to 200 times less memory than the state-of-the-art algorithm HUINIV-Mine. Moreover, it is shown that FHN performs especially well on dense datasets.

Introduction

Frequent Itemset Mining (FIM) [1], [11], [12], [22], [29] is a core data mining task, that is essential to a wide range of applications. Given a transactional database containing a large number of transactions and a user-specified minimum support threshold, FIM aims at discovering frequent itemsets, that is sets of items having occurrence frequencies no less than a minimum support threshold set by the user [1]. However, an important limitation of FIM is that each item cannot appear more than once in a transaction and that all items are assumed to have the same importance (i.e., weight, cost, risk, unit profit or value). But this assumption does not hold in real-world applications. For example, transactions made at retail stores usually contains information about the purchase quantities of items and all items do not have the same unit profit. If the traditional FIM algorithms are applied on such database, they would discard this information and may thus discover many frequent itemsets that generates a low profit.

To address this issue, the problem of FIM has been redefined as High-Utility Itemset Mining (HUIM). HUIM considers both the purchase quantities of items in transactions, and the unit profit of items, to discover the items/itemsets in a database that generate a high profit (have a utility that is no less than a minimum utility threshold). The discovered patterns are called high utility itemsets (HUIs). HUIM has many real-life applications such as website click stream analysis, cross-marketing in retail stores, and biomedical applications [2], [17], [21]. Many research issues related to HUIM have also emerged such as high-utility stream data mining [19], high-utility episode mining [24], high-utility sequential pattern mining [27], [28], and high-utility sequential rule mining [30].

The problem of HUIM is widely recognized as more difficult than the problem of FIM. In FIM, the well-known downward-closure property states that the support of an itemset is anti-monotonic; that is all supersets of an infrequent itemset are infrequent and all subsets of a frequent itemset are frequent. This property is very powerful to prune the search space in FIM. In HUIM, however, the utility of an itemset is neither monotonic nor anti-monotonic, which indicates that a high utility itemset may have supersets or subsets with lower, equal or higher utility [2], [18], [21]. Thus, techniques to prune the search space developed in FIM cannot be directly applied in HUIM.

A popular approach for HUIM is to discover high-utility itemsets in two phases by using the Transaction-Weigthed Utilization (TWU) downward closure model [2], [18], [21]. This approach has been adopted by numerous algorithms such as Two-Phase [18], IHUP [2], UP-Growth and UP-Growth+ [21]. The approach consists of first generating a set of candidate high-utility itemsets by overestimating their utility in Phase I. After that, the algorithms perform an additional database scan in Phase II to calculate the exact utility of the discovered candidates and filter low-utility itemsets.

Although, the TWU model has been largely used in HUIM, it suffers from an important drawback. It considers a huge number of low-utility itemsets as candidates since the TWU model uses a loose upper-bound called the transaction utility to overestimate the utility of candidates. Recently, a more efficient approach namely HUI-Miner [17] was proposed to directly mine high-utility itemsets using a single database scan. Based on the designed vertical data structure (utility-list), HUI-Miner utilizes the actual utility and remaining utility of an itemset in a database to calculate a tighter upper-bound, to more effectively prune the search space. Experimental results have shown that the HUI-Miner algorithm outperforms previous HUIM algorithms and is thus the current state-of-the-art algorithm for HUIM [17]. However, the task of high-utility itemset mining remains very costly in terms of execution time. Therefore, it remains an important challenge to design more efficient algorithms to handle the above limitations.

Besides, although many studies have been carried to develop efficient algorithms for HUIM (e.g. Two-Phase [18], IHUP [2], UP-Growth [21], HUI-Miner [17], FHM [9], BAHUI [20] and HUP-Miner [10]), they are designed under the assumption that all items in transactional databases have positive weights/unit profits. Thus, most algorithms developed for HUIM cannot be directly applied to mine HUIs while considering items having negative weights/unit profits, which usually occur in many real-life transaction databases. For example, if a customer buys three units of an item A in a supermarket, (s)he may receive one unit of item B for free as a promotion to promote product B. Now suppose that each unit of item A yields a profit of five dollars, and each unit of item B that is given away costs two dollars. Although giving away a unit of item B results in a loss of two dollars for the supermarket, selling three units of A that are cross-promoted with item B generates 15 dollars. Thus, the supermarket can have a net gain of 13 dollars each time that this promotion is applied.

It was shown that if classical HUIM algorithms are applied on databases containing items with negative unit profits, they can generate an incomplete set of HUIs [4]. The reason is that these algorithms overestimate the utilities of itemsets to prune the search space. But when items with negative unit profits are considered, these estimations may become underestimations, and numerous HUIs may be pruned. Recently, the HUINIV-Mine algorithm [4] was developed to handle the problem of HUIM with both positive and negative unit profits. The TS-HOUN algorithm [13] was then proposed, which considers both the on-shelf time periods of items and negative unit profits. But the state-of-the-art algorithm for mining HUIs while considering negative unit profits remains HUINIV-Mine [4]. However, mining HUIs with negative unit profits remains very costly in terms of execution time and memory [4], [13]. Therefore, it is an important challenge to design a more efficient algorithms for solving the above limitations.

In this paper, we address the challenge of designing a more efficient algorithm for discovering high utility itemsets from a transactional database by considering both positive and negative unit profits. We present a novel algorithm named FHN¹. (Fast High-utility itemset miner with Negative unit profits) to mine HUIs. Based on the designed vertical PNU-list data structure and several pruning strategies, FHN can efficiently handle negative unit profits. Experimental results on both real-life and synthetic datasets show that the proposed FHN algorithm is in general two to three orders of magnitude faster than the state-of-the-art HUINIV-Mine algorithm and performs well on dense datasets. The key contributions of this paper are as follows:

1.
A vertical list structure, called PNU-list (positive-and-negative utility-list), is designed to maintain all the information required for mining HUIs without performing multiple time-consuming database scans. The designed PNU-list structure allows FHN to directly mine HUIs without generating candidates.
2.
Two efficient pruning strategies named remaining utility pruning and EUCP pruning are further proposed to reduce the search space when using the PNU-list structure, and thus speed up the mining process for obtaining HUIs.
3.
A modified LA-Prune strategy is adopted in FHN to prune numerous unpromising candidates early when constructing PNU-lists.
4.
An extensive experimental study is carried on several real-life datasets. Results show that the proposed algorithm outperforms the state-of-the-art HUINIV-Mine algorithm in terms of runtime, memory consumption and scalability.

The rest of this paper is organized as follows. Related work is discussed in Section 2. The preliminaries and problem definition are given in Section 3. The proposed FHN algorithm is described in Section 4. An extensive experimental evaluation is presented in Section 5. Finally, the conclusion and future work are discussed in Section 6.

Section snippets

Related work

In this section related work is discussed. The section reviews (1) the main approaches for frequent itemset mining, (2) previous work on high-utility itemset mining, and (3) state-of-the-art algorithms for mining high utility itemset with negative values.

Preliminaries and problem definition

In this section, we introduce some important preliminary definitions relative to high utility itemset mining and formalize the problem of HUIM while considering negative unit profit values.

Definition 1 Transaction database

Let I be a set of items (symbols). An itemset is a group of items X ⊆ I, and is said to be of length k or to be a k-itemset if it contains k items. A transaction database is a set of transactions $D = {T_{1}, T_{2}, \dots, T_{n}}$ such that for each transaction T_c, T_c ∈ I and T_c has a unique identifier c called its tid. Each

Proposed FHN algorithm

In this section, we propose a Faster High-Utility itemset miner with Negative unit profits (FHN) algorithm based on a new designed Positive-and-Negative Utility list (PNU-list) structure. Several pruning strategies are also designed to prune the search space early, thus speeding up the mining process. The PNU-list structure is inspired by the utility-list structure from HUI-Miner [17] but also has some key differences. Some properties of the designed approach for handling the negative item unit

Experimental study

The goal of this paper is to propose a more efficient algorithm for mining HUIs when considering items having both positive or negative utilities. In this section, we thus compare the performance of the proposed FHN algorithm against the state-of-the-art algorithm for this task, named HUINIV-Mine. Experiments were done in Java and performed on a computer with a third generation 64 bit Core i5 processor running the Windows 7 operating system and with 4 GB of free RAM. We compared the performance

Conclusion

In this paper, we have studied the problem of mining high utility itemsets from transactional databases with negative unit profits. Specifically, we have presented a novel Fast High-utility itemset miner with Negative unit profits (FHN) algorithm for mining high utility itemsets in databases where item unit profits may be positive or negative. A vertical list structure, called Positive-and-Negative Utility (PNU)-list, is designed for FHN so that it mines high utility itemsets without generating

Acknowledgement

This work is financed by a National Science and Engineering Research Council (NSERC) of Canada research grant and by the Tencent Project under grant CCF-TencentRAGR20140114.

References (30)

C.J. Chu et al.
An efficient algorithm for mining high utility itemsets with negative item values in large databases
Applied Math. Comput.
(2009)
S. Krishnamoorthy
Pruning strategies for mining high utility itemsets
Expert Systems with Applications
(2015)
G.C. Lan et al.
On-shelf utility mining with negative item values
Expert Systems with Applications
(2014)
G.C. Lan et al.
Discovery of high utility itemsets from on-shelf time periods of products
Expert Systems with Applications
(2011)
LinC.W. et al.
An effective tree structure for mining high utility itemsets
Expert Syst. Appl.
(2011)
R. Agrawal et al.
Fast algorithms for mining association rules in large databases
Proc. Int. Conf. Very Large Databases
(1994)
C.F. Ahmed et al.
Efficient tree structures for high-utility pattern mining in incremental databases
IEEE Trans. Knowl. Data Eng.
(2009)
ChanR. et al.
Mining high-utility itemsets
Proc. ICDM03
(2003)
P. Fournier-Viger
FHN: efficient mining of high-utility itemsets with negative unit profits
Proc. 10th Int. Conf. Advanced Data Mining and Application
(2014)
P. Fournier-Viger et al.
SPMF: a java open-source pattern mining library
J. Mach. Learn. Res.
(2014)

P. Fournier-Viger et al.

VMSP: efficient vertical mining of maximal sequential patterns

Proc. 27th Canadian Conf. on Artificial Intelligence, Springer, LNAI

(2014)

P. Fournier-Viger et al.

Novel concise representations of high utility itemsets using generator patterns

Proc. 10th Int. Conf. on Advanced Data Mining and Applications

(2014)

P. Fournier-Viger et al.

FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning

Proc. 21st Intern. Symp. Methodologies Intell. Systems

(2014)

J. Han et al.

Mining frequent patterns without candidate generation: a frequent-pattern tree approach

Data Min. Knowl. Discov.

(2004)

J. Han et al.

Mining frequent patterns without candidate generation: a frequent-pattern tree approach

Data Min. Knowl. Discov.

(2004)

Cited by (94)

An efficient method for mining High-Utility itemsets from unstable negative profit databases
2024, Expert Systems with Applications
The study of High-Utility Itemset Mining (HUIM) and Frequent Itemset Mining (FIM) is crucial since it explains consumer behavior and offers actionable advice to improve business results. HUIM algorithms have been successfully established to identify high-utility itemsets, including those with negative utilities. The problem with these approaches is that they presume incorrectly that items with negative utility across transactions would always be losses. Products with positive profitability may seem negative when combined with other items to increase sales or reduce inventory. Using strict upper-bound approaches, this paper presents strategies for making database scanning more efficient and reducing the number of prospective candidates. We also prove that it is correct to use the proposed upper-bounds for pruning on several types of items in the database. Based on all the proposed solutions, we develop a novel algorithm to solve this problem efficiently. To demonstrate their efficiency, the algorithms are tested against states-of-art HUIM algorithm on diverse datasets with regard to size and characteristics with unstable negative profits.
Parallel approaches to extract multi-level high utility itemsets from hierarchical transaction databases
2023, Knowledge-Based Systems
In the field of data mining, high utility itemset mining (HUIM) is a relevant mining task, with the aim of analyzing customer transaction databases. HUIM consists of exploiting the set of items that are often purchased together and yield high profit value. In real-world applications, transaction databases often come with item categorization, stored in a taxonomy. Items in these databases can be clustered into specific categories at higher levels of abstraction. Extracting and analyzing itemsets discovered from different levels of abstraction can provide more useful insights into customer behaviors. However, considering item taxonomy increases the problem’s complexity, hence prolonging the execution time needed to explore the search space. Parallelism is thus employed to address this drawback, but previous approaches are not efficient as they only adopt simple scheduling strategies or do not utilize the full capabilities of a multi-core processor. This work introduces three new efficient strategies to significantly boost the performance of the multi-level high utility itemset mining task using multi-core processing. Two new algorithms, called MCML+ and MCML++, are also proposed by adopting the suggested strategies. Extensive experiments on several large databases show that our proposed algorithms have better performance compared to previous approaches in terms of running time and scalability, up to 4.0 times better than the previous parallelized algorithm, the MCML-Miner algorithm; and over 9.0 times faster than the original sequential algorithm, the MLHUI-Miner algorithm.
Mining high-utility sequences with positive and negative values
2023, Information Sciences
Sequence pattern discovery is a fundamental topic in the domain of data mining. It has been widely used to solve various problems (e.g., behavior pattern discovery, gene pattern discovery in bioinformatics, user click pattern mining, etc.). High-utility sequence mining as a novel hot issue is more challenging and has generally attracted plenty of attention. Our paper focuses on mining high-utility sequences in a more complicated environment with high efficiency. Most of the previous methods for utility mining aim to find high-utility sequences suitable for items with positive values, but most real-world situations contain items with both positive and negative values. Several algorithms have been applied to the above sophisticated situation and can be used as our comparing algorithms. In this paper, we introduce the FHUSN (Fast mining High Utility Sequences with Negative item) algorithm to mine high-utility sequences in situations with or without negative utility values. FHUSN utilizes the new utility array to store data. Several new pruning strategies that apply to situations with or without negative values have been used to reduce search space. Experiments are carried out on several benchmark datasets, and experimental results illustrate that our method has better performance.
Mining periodic high-utility itemsets with both positive and negative utilities
2023, Engineering Applications of Artificial Intelligence
Mining high-utility patterns in databases containing items with both positive and negative profits is useful in market basket databases, since negative profits are common in the real world. Obviously, in the market basket database, patterns with stable long-term profits have more meaning. The discovery of itemsets with a consistent high frequency is known as periodic frequent pattern mining. Therefore, mining periodic high-utility patterns in a database containing items with both positive and negative profits is an interesting and useful task. However, this task has two main challenges. First, the utility measure does not have the download closure property. Second, the huge search space needs to be pruned more effectively. In this paper, we propose a vertical data structure-based algorithm called PHMN to discover periodic high-utility patterns (PHUPs) or itemsets in a transaction database with both positive and negative utilities. To be more efficient, we propose a new upper bound to prune the search space and an improved algorithm to discover the PHUPs. Finally, experiments are conducted to verify the effectiveness and efficiency of algorithms.
HLHUI: An improved version of local high utility itemset mining
2023, Procedia Computer Science
High utility itemsets (HUIs) have been emerged to address the main problems of frequent itemset mining, namely considering the same importance for all items of the dataset and ignoring the occurrence numbers of items within transactions during the mining process. Local and peak HUIs were defined to mine the itemsets which are useful and high utility during specific periods of time. In this paper, using some adopted definitions and strategies of HMiner [19], an improved version of LHUI method [33], called HLHUI (Hminer-based Local HUI mining), is introduced that mines local HUIs using a utility-list-based approach. Performance evaluations of the proposed method show that it can efficiently find useful itemsets.
EHMIN: Efficient approach of list based high-utility pattern mining with negative unit profits
2022, Expert Systems with Applications
High-utility pattern mining is an important sub-literature in the data mining literature. This literature discusses the discovery of useful pattern information from large databases by considering not only supports of patterns but also profits and quantities of items. This literature has the potential to be applied to various problems in the real world, so many methods for the improvement of the algorithm performance have been studied. Moreover, there have also been attempts to extend the flexibility of this literature. The traditional approaches in this literature considered the positive unit profits of items in a given database only. However, this literature can take extended flexibility into account by considering negative as well as positive unit profits of the items. In this paper, we suggest an efficient approach for mining high-utility patterns with negative unit profits. Moreover, the experimental performance tests, which are performed on various real and synthetic datasets in this paper, show that the proposed algorithm has a better performance than the state-of-the-art methods in this literature in terms of the runtime, memory usage, and scalability.

View all citing articles on Scopus

View full text

FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

Abstract

Introduction

Section snippets

Related work

Preliminaries and problem definition

Proposed FHN algorithm

Experimental study

Conclusion

Acknowledgement

Expert Syst. Appl.

Fast algorithms for mining association rules in large databases

Proc. Int. Conf. Very Large Databases

Efficient tree structures for high-utility pattern mining in incremental databases

IEEE Trans. Knowl. Data Eng.

Mining high-utility itemsets

Proc. ICDM03

FHN: efficient mining of high-utility itemsets with negative unit profits

Proc. 10th Int. Conf. Advanced Data Mining and Application

SPMF: a java open-source pattern mining library

J. Mach. Learn. Res.

VMSP: efficient vertical mining of maximal sequential patterns

Proc. 27th Canadian Conf. on Artificial Intelligence, Springer, LNAI

Novel concise representations of high utility itemsets using generator patterns

Proc. 10th Int. Conf. on Advanced Data Mining and Applications

FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning

Proc. 21st Intern. Symp. Methodologies Intell. Systems

Mining frequent patterns without candidate generation: a frequent-pattern tree approach

Data Min. Knowl. Discov.

Mining frequent patterns without candidate generation: a frequent-pattern tree approach

Data Min. Knowl. Discov.