RWFIM: Recent weighted-frequent itemsets mining

doi:10.1016/j.engappai.2015.06.009

Engineering Applications of Artificial Intelligence

Volume 45, October 2015, Pages 18-32

https://doi.org/10.1016/j.engappai.2015.06.009 Get rights and content

Abstract

In recent years, weighted frequent itemsets mining (WFIM) has become a critical issue of data mining, which can be used to discover more useful and interesting patterns in real-world applications instead of the traditional frequent itemsets mining. Many algorithms have been developed to find weighted frequent itemsets (WFIs) without time-sensitive consideration. The discovered out-of-date information may, however, be meaningless and useless in decision making. In this paper, a novel framework, namely recent weighted-frequent itemsets mining (RWFIM) is proposed to concern both the weight and time-sensitive constraints. A projected-based RWFIM-P algorithm is first proposed for mining the designed recent weighted-frequent itemsets (RWFIs) with weight and time-sensitive consideration. It uses the projection-and-test mechanism to discover RWFIs in a recursive way. Based on the developed RWFIM-P algorithm, the entire database can be projected and divided into several sub-databases according to the currently processed itemset, thus reducing the computational costs and memory requirements. The second RWFIM-PE algorithm is also proposed to improve the performance of the first RWFIM-P algorithm based on the developed Estimated Weight of 2-itemset Pruning (EW2P) strategy to mine the RWFIs without generating the unpromising candidates, thus avoiding the computations of the projection mechanism compared to the first RWFIM-P algorithm. Experiments are conducted to evaluate the performance of the proposed two algorithms compared to the traditional WFIM in terms of execution time, number of generated RWFIs and scalability under varied two minimum thresholds in several real-world and synthetic datasets.

Introduction

Knowledge Discovery in Databases (KDD) is a process used to discover meaningful and useful information from a collection of data (Agrawal et al., 1993, Agrawal and Srikant, 1995, Han et al., 2004, Srikant and Agrawal, 1996, Yun and Leggett, 2006). Depending on different requirements in various domains and applications, the discovered knowledge can be generally classified as association rules (Agrawal et al., 1993, Chen et al., 1996), sequential patterns (Agrawal and Srikant, 1995, Srikant and Agrawal, 1996, Yun and Leggett, 2006), interesting patterns (Geng and Hamilton, 2006, Hong et al., 2009), and among others (Lan et al., 2013, Vo et al., 2013, Yun and Leggett, 2005). Among them, association-rule mining is the most commonly used knowledge of KDD, which can be used to represent the relationships among items or itemsets in the transactional databases. Agrawal et al. first developed the two-phase Apriori algorithm (Agrawal and Srikant, 1994) to level-wisely generate and test candidates for mining association rules. In the first phase, the frequent itemsets are level-wisely discovered based on minimum support threshold. In the second phase, the retrieved frequent itemsets are used to infer association rules based on minimum confidence threshold. Frequent itemsets mining (FIM) of the association-rule mining has been extensively studied as an important task for a wide range of real-world applications (Han et al., 2004). Many algorithms have been developed to efficiently mine the desired frequent itemsets or association rules in the binary databases (Agrawal et al., 1993, Agrawal and Srikant, 1994, Chen et al., 1996, Geng and Hamilton, 2006, Han et al., 2004).

For the FIM (Agrawal et al., 1993, Chen et al., 1996, Han et al., 2004), it only concerns the frequencies of items or itemsets in the transactional databases. The other implicit factors such as weight, interest, risk or profit are not considered in the FIM. Besides, each item is assigned with the same significance in traditional FIM; the actual significant items or itemsets cannot be easily recognized. For example, both the itemsets (AC) and (CE) are considered as the frequent itemsets with the same frequent value in the traditional databases. The itemset (AC) is more important than the itemset (CE) while the important factor of (AC) is assigned with 0.9 and the important factor of (CE) is assigned with 0.6. Weighted-based frequent itemsets mining (WFIM) was thus proposed to concern both the weight (importance) and the frequent factors to mine the weighted frequent itemsets (Cai et al., 1998, Lan et al., 2013, Tao et al., 2003, Vo et al., 2013, Yun and Leggett, 2005). Accordingly, the weight (i.e. its importance, interest or risk) of each item can be pre-defined based on users’ priori knowledge. An itemset is concerned as a weighted frequent itemset (WFI) if its weighted support is no less than the minimum weighted-support threshold. Cai et al. (1998) first defined a weighted-support model and further designed the k-support bound to maintain the anti-monotone property for mining association rules with weight consideration. Yun and Leggett developed the pattern-growth algorithm and maintained the downward closure property of WFIM (Yun and Leggett, 2005). Vo et al. (2013) also designed a Weighted Itemset Tidset tree (WIT)-tree and a Diffset strategy to efficiently mine the WFIs. Several studies are also developed in progress to mine the weighted frequent itemsets or weighted sequential patterns (Lan et al., 2014, Sun and Bai, 2008, Yun and Leggett, 2006).

Although the WFIs can reveal more useful information in the entire databases than traditional FIs, the discovered WFIs may be irrelevant to decision making if they only occurred in the longest past. In addition, an itemset may not be a WFI in the entire database but a WFI in the recent intervals with time-sensitive consideration. For example, the combination of (jacket, stocking) may not be concerned as a WFI in the entire database but can be considered as a popular product in the recently winter season. Besides, different items may have different exhibition periods in a log database. It is unfair to measure the interesting patterns without time consideration since an out-of-date WFI may be meaningless and useless for decision making. Recent information or sales trends are more important than the old one, which can be used to help managers or retailers for making the efficient making. It is thus the significant issue to find the recent WFIs than the traditional WFIs in the entire database.

Recently, Bouker et al., 2012, Bouker et al., 2013, Bouker et al., 2014 stated how to make a semantic and statistical selection of incomparable association rules, and aimed to discover interesting association rules without favoring or excluding any measure among the used measures. In this study, a new knowledge representation, namely recent weighted-frequent itemsets (RWFIs), is first developed to reveal more useful and meaningful weighted-frequent itemsets with time-sensitive consideration. In real world applications, both the recent weighted-frequent itemsets and the weighted-frequent itemsets belong to the same semantic context; these dominated patterns are semantically related (i.e. comparable) and can be used to aid managers or retailers for decision making by discovering the high weighted-frequent patterns. Due to the consideration of time-sensitive constraint, the proposed RWFIs contains the up-to-date information, which can be considered as more interesting and helpful patterns than the out-of-date ones.

The RWFIM-P and RWFIM-PE algorithms are respectively developed in this paper to efficiently mine the RWFIs based on a projection-based approach and the Estimated Weight of 2-itemset Pruning (EW2P) strategy. Since the discovered RWFIs can be used to indicate the recent WFIs, a huge number of redundant or elder WFIs can be significantly pruned. Contributions of this paper are described below.

1.
A novel knowledge, namely recent weighted-frequent itemsets (RWFIs), is designed to reveal more useful and meaningful weighted-frequent itemsets (WFIs) with time-sensitive consideration. To the best of our knowledge, this is the first paper to focus on the issue of mining weighted frequent itemsets with both weight and time-sensitive constraints.
2.
The RWFIM-P algorithm is proposed as a baseline approach to level-wisely mine the RWFIs based on the projection mechanism. Since only the sub-databases are required to be scanned of the currently processed itemset instead of the entire database, the RWFIM-P algorithm is quite efficient to discover the RWFIs.
3.
An efficient Estimated Weight of 2-itemset Pruning (EW2P) pruning strategy is developed in the secondly developed RWFIM-PE algorithm to efficiently derive the RWFIs without projecting the sub-databases and avoiding to produce a huge number of unpromising candidates.
4.
A time-decay strategy is also defined to assign the recent weight of each transaction, which can be adjusted according to the users’ specification. A transaction is set with higher decay weight if it is closed to the currently processed timestamp, which is more practical in real-world applications for mining the recent weighted-frequent itemsets.
5.
Experimental results also showed that the proposed two algorithms can discover the complete and correct RWFIs, and the second improved algorithm performs better than the first one, in terms of runtime, memory consumption and scalability.

The remainder of the paper is organized as follows. Related works are reviewed in Section 2. The developed knowledge of RWFIs is described in Section 3. Two proposed RWFIM-P and RWFIM-PE algorithms are respectively proposed in 4 Projected-based recent weighted-frequent itemsets mining (RWFIM-P) algorithm, 5 Projected-based with early pruning recent weighted-frequent itemsets mining (RWFIM-PE) algorithm. Experiments are conducted in Section 6. Conclusions are given in Section 7.

Section snippets

Related works

In this section, the related works of weighted-based frequent patterns mining and constraint-based itemsets mining are briefly reviewed.

Preliminaries and problem statement

In this section, the preliminaries and problem statement related to recent weighted frequent itemsets mining (RWFIM) from transactional databases are given below.

Projected-based recent weighted-frequent itemsets mining (RWFIM-P) algorithm

In this section, a Recent Weighted-Frequent Itemsets Mining Projected-based (RWFIM-P) algorithm is firstly designed to mine the RWFIs based on projection mechanism. The RWFIM-P algorithm has two main phases to firstly scan the original database for respectively calculating the relevant transactional upper-bound weight (tubw) and the recency value of each transaction. After that, the transactional accumulation upper-bound weight (taubw), the weighted support (wsup), and the recency of each

Projected-based with early pruning recent weighted-frequent itemsets mining (RWFIM-PE) algorithm

In this section, a Recent Weighted-Frequent Itemsets Mining Projected-based approach with Early pruning (RWFIM-PE) algorithm is further proposed to improve the firstly designed RWFIM-P algorithm with the developed early termination pruning strategy. Although the recent weighted-frequent upper-bound downward closure (RWFUBDC) property is efficient to reduce the search space based on the first designed RWFIM-P algorithm, many unpromising candidates are still required to be generated and

Experimental results

In this section, the performance of the proposed two RWFIM-P and RWFIM-PE algorithms for mining recent weighted-frequent itemsets (RWFIs) is evaluated in four datasets. Note that this is the first paper to consider both the weight and time-sensitive constraints for mining RWFIs. The PWA algorithm (Lan et al., 2013) for mining the weighted frequent itemsets (WFIs) is implemented as the benchmark against our designed algorithms.

All algorithms in the experiments are implemented in the Java

Conclusions

In this paper, a novel knowledge namely recent weighted-frequent itemsets (RWFIs) is proposed to solve the limitations of traditional weighted frequent itemsets mining by considering both the weight and time-sensitive constraints. Based on the developed RWFIs, more meaningful and condensed information of recent trend can be discovered by two designed RWFIM-P and RWFIM-PE algorithms. The first RWFIM-P algorithm adopts the projection mechanism to level-wisely project the sub-databases for mining

Acknowledgement

This research was partially supported by the Tencent Project under grant CCF-TencentRAGR20140114, by the Shenzhen Peacock Project, China, under grant KQC201109020055A, by the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under grant HIT.NSRIF.2014100, and by the Shenzhen Strategic Emerging Industries Program under grant ZDSY20120613125016389.

References (26)

T.P. Hong et al.
An effective mining approach for up-to-date patterns
Expert Syst. Appl.
(2009)
G. Lee et al.
Sliding window based weighted maximal frequent pattern mining over data streams
Expert Syst. Appl.
(2014)
B. Vo et al.
A new method for mining frequent weighted itemsets based on wit-trees
Expert Syst. Appl.
(2013)
Agrawal, R., Srikant, R., 1994. Fast algorithms for mining association rules in large databases. In: Proceedings of...
Agrawal, R., Srikant, R., 1994. Quest Synthetic Data Generator. Available:...
Agrawal, R., Srikant, R., 1995. Mining sequential patterns. In: Proceedings of International Conference on Data...
Agrawal, R., Imielinski, T., Swami, A., 1993. Mining association rules between sets of items in large database. In:...
Bouker, S., Saidi, R., Yahia, S.B., Nguifo, E.M., 2012. Ranking and selecting association rules based on dominance...
Bouker, S., Saidi, R., Yahia, S.B., Nguifo, E.M., 2013. Towards a semantic and statistical selection of association...
S. Bouker et al.
Mining undominated association rules through interestingness measures
Int. J. Artif. Intell. Tools
(2014)

Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W., 1998. Mining association rules with weighted items. In: Proceedings of...

M.S. Chen et al.

Data mining: An overview from a database perspective

IEEE Trans. Knowl. Data Eng.

(1996)

Frequent Itemset Mining Dataset Repository, 2012. Available:...

Cited by (44)

Mining periodic trends via closed high utility patterns
2023, Expert Systems with Applications
High utility pattern mining (HUPM) plays a significant role in data mining technologies. Traditional HUPM algorithms may produce a large number of high utility patterns (HUPs) when the database is dense or the data is massive. To address this issue, closed high utility pattern (CHUP) mining was proposed, providing a high-level overview of the HUPs and helpful information for decision-makers. However, CHUPs do not consider the factors of period and recency. Therefore, this paper is the first to introduce period and recency into closed high utility pattern mining and proposes the CPR-Miner algorithm to mine closed periodic recent high utility patterns. These patterns have more practical value since they are closed sets of HUPs. Due to the increasing number of factors to be considered, new upper bounds and pruning strategies are also proposed, significantly improving the algorithm’s efficiency. To test the performance of our algorithm and our new pruning strategies, we improved the PHM algorithm to generate the PR-Miner algorithm. Experimental results show a significant efficiency of the new pruning strategies and demonstrate that CPR-Miner outperforms the PR-Miner algorithm in all aspects.
Using text mining and multilevel association rules to process and analyze incident reports in China
2023, Accident Analysis and Prevention
Incident investigation reports provide information on defects related to the system safety and indications for improvements. Currently, the analysis of these reports relies heavily on expert’ experience. The foreseeable work-load and lack of understanding about the importance of near misses have created a situation where severe accidents are rigorously investigated, and minor incidents are often omitted. Consequently, incident reports have not been fully analyzed to provide sufficient solutions.
The aim of this research is to propose a framework that uses text mining and multilevel association rules to efficiently structure Chinese incident reports and identify important incident patterns, providing an analysis of trends, rectification strategies, and guidance for safety management.
A case study of a construction company in China was conducted using two years of incident data dated 2018–2019, including accidents and near misses. To identify incident elements, a pattern extraction workflow involving TextRank, and domain pertinence was devised based on the linguistic and writing styles of Chinese reports. A concept hierarchy was applied to determine the taxonomic relationships within the risk factors. Multilevel association rule mining was adopted and proven to deliver more comprehensive pattern indications. Comparative and cross-analysis of patterns in different time periods revealed the severity and temporal features of incidents as well as the effectiveness of preventive and precautionary measures. The results also highlight the importance of learning from near miss events. Decision makers can formulate countermeasures and management policies based on these results to improve safety performance.
Efficient approach for incremental weighted erasable pattern mining with list structure
2020, Expert Systems with Applications
Citation Excerpt :
The weight of pattern {b, e}, weight ({b, e}), is 0.5 that is the average of w(b) and w(e). As in the previous example, the weights of the patterns are obtained and used for weighted frequent pattern mining(Lin, Gan, Fournier-Viger, Hong, & Tseng, 2016; Lin, Gan, Fournier-Viger, & Hong, 2015). WEPMDS (Ahmed, Tanbeer, & Jeong, 2009) is an algorithm for weighted frequent pattern mining of dynamic databases based on sliding windows.
Erasable pattern mining is one of the important fields of frequent pattern mining. It diagnoses and solves the economic problems that arise in the manufacturing industry. The real-world database is continually accumulated over time, and each item has a different importance. Therefore, if we use conventional erasable pattern mining without considering the characteristics of the real-world database, less meaningful patterns can be extracted. Also, when mining a real-world database, the algorithm must be able to process operations quickly and efficiently. In this paper, in order to meet these requirements, we propose an algorithm which is implemented as a list structure for mining erasable patterns in an incremental database with weighted condition. Compared to existing state-of-the-art mining algorithms, the proposed algorithm performs pattern pruning by applying weighted condition to a dynamic database, so it extracts fewer candidate patterns and shows fast performance. We test our algorithms and the algorithms previously presented with various real datasets and synthetic datasets and obtained results such as run time, memory usage, scalability, and accuracy tests. By analyzing and comparing these experimental results, we show that the proposed algorithm has outstanding performance.
An Improved FP-Growth Algorithm with Time Decay Factor and Element Attention Weight
2024, 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications, ICPECA 2024
A novel B&B recommendation method based on improved intuitionistic fuzzy sets
2024, Kybernetes
A survey on soft computing-based high-utility itemsets mining
2022, Soft Computing

View all citing articles on Scopus

View full text

RWFIM: Recent weighted-frequent itemsets mining

Abstract

Introduction

Section snippets

Related works

Preliminaries and problem statement

Projected-based recent weighted-frequent itemsets mining (RWFIM-P) algorithm

Projected-based with early pruning recent weighted-frequent itemsets mining (RWFIM-PE) algorithm

Experimental results

Conclusions

Acknowledgement

Expert Syst. Appl.

Expert Syst. Appl.

Expert Syst. Appl.

Mining undominated association rules through interestingness measures

Int. J. Artif. Intell. Tools

Data mining: An overview from a database perspective

IEEE Trans. Knowl. Data Eng.