RWFIM: Recent weighted-frequent itemsets mining
Introduction
Knowledge Discovery in Databases (KDD) is a process used to discover meaningful and useful information from a collection of data (Agrawal et al., 1993, Agrawal and Srikant, 1995, Han et al., 2004, Srikant and Agrawal, 1996, Yun and Leggett, 2006). Depending on different requirements in various domains and applications, the discovered knowledge can be generally classified as association rules (Agrawal et al., 1993, Chen et al., 1996), sequential patterns (Agrawal and Srikant, 1995, Srikant and Agrawal, 1996, Yun and Leggett, 2006), interesting patterns (Geng and Hamilton, 2006, Hong et al., 2009), and among others (Lan et al., 2013, Vo et al., 2013, Yun and Leggett, 2005). Among them, association-rule mining is the most commonly used knowledge of KDD, which can be used to represent the relationships among items or itemsets in the transactional databases. Agrawal et al. first developed the two-phase Apriori algorithm (Agrawal and Srikant, 1994) to level-wisely generate and test candidates for mining association rules. In the first phase, the frequent itemsets are level-wisely discovered based on minimum support threshold. In the second phase, the retrieved frequent itemsets are used to infer association rules based on minimum confidence threshold. Frequent itemsets mining (FIM) of the association-rule mining has been extensively studied as an important task for a wide range of real-world applications (Han et al., 2004). Many algorithms have been developed to efficiently mine the desired frequent itemsets or association rules in the binary databases (Agrawal et al., 1993, Agrawal and Srikant, 1994, Chen et al., 1996, Geng and Hamilton, 2006, Han et al., 2004).
For the FIM (Agrawal et al., 1993, Chen et al., 1996, Han et al., 2004), it only concerns the frequencies of items or itemsets in the transactional databases. The other implicit factors such as weight, interest, risk or profit are not considered in the FIM. Besides, each item is assigned with the same significance in traditional FIM; the actual significant items or itemsets cannot be easily recognized. For example, both the itemsets (AC) and (CE) are considered as the frequent itemsets with the same frequent value in the traditional databases. The itemset (AC) is more important than the itemset (CE) while the important factor of (AC) is assigned with 0.9 and the important factor of (CE) is assigned with 0.6. Weighted-based frequent itemsets mining (WFIM) was thus proposed to concern both the weight (importance) and the frequent factors to mine the weighted frequent itemsets (Cai et al., 1998, Lan et al., 2013, Tao et al., 2003, Vo et al., 2013, Yun and Leggett, 2005). Accordingly, the weight (i.e. its importance, interest or risk) of each item can be pre-defined based on users’ priori knowledge. An itemset is concerned as a weighted frequent itemset (WFI) if its weighted support is no less than the minimum weighted-support threshold. Cai et al. (1998) first defined a weighted-support model and further designed the k-support bound to maintain the anti-monotone property for mining association rules with weight consideration. Yun and Leggett developed the pattern-growth algorithm and maintained the downward closure property of WFIM (Yun and Leggett, 2005). Vo et al. (2013) also designed a Weighted Itemset Tidset tree (WIT)-tree and a Diffset strategy to efficiently mine the WFIs. Several studies are also developed in progress to mine the weighted frequent itemsets or weighted sequential patterns (Lan et al., 2014, Sun and Bai, 2008, Yun and Leggett, 2006).
Although the WFIs can reveal more useful information in the entire databases than traditional FIs, the discovered WFIs may be irrelevant to decision making if they only occurred in the longest past. In addition, an itemset may not be a WFI in the entire database but a WFI in the recent intervals with time-sensitive consideration. For example, the combination of (jacket, stocking) may not be concerned as a WFI in the entire database but can be considered as a popular product in the recently winter season. Besides, different items may have different exhibition periods in a log database. It is unfair to measure the interesting patterns without time consideration since an out-of-date WFI may be meaningless and useless for decision making. Recent information or sales trends are more important than the old one, which can be used to help managers or retailers for making the efficient making. It is thus the significant issue to find the recent WFIs than the traditional WFIs in the entire database.
Recently, Bouker et al., 2012, Bouker et al., 2013, Bouker et al., 2014 stated how to make a semantic and statistical selection of incomparable association rules, and aimed to discover interesting association rules without favoring or excluding any measure among the used measures. In this study, a new knowledge representation, namely recent weighted-frequent itemsets (RWFIs), is first developed to reveal more useful and meaningful weighted-frequent itemsets with time-sensitive consideration. In real world applications, both the recent weighted-frequent itemsets and the weighted-frequent itemsets belong to the same semantic context; these dominated patterns are semantically related (i.e. comparable) and can be used to aid managers or retailers for decision making by discovering the high weighted-frequent patterns. Due to the consideration of time-sensitive constraint, the proposed RWFIs contains the up-to-date information, which can be considered as more interesting and helpful patterns than the out-of-date ones.
The RWFIM-P and RWFIM-PE algorithms are respectively developed in this paper to efficiently mine the RWFIs based on a projection-based approach and the Estimated Weight of 2-itemset Pruning (EW2P) strategy. Since the discovered RWFIs can be used to indicate the recent WFIs, a huge number of redundant or elder WFIs can be significantly pruned. Contributions of this paper are described below.
- 1.
A novel knowledge, namely recent weighted-frequent itemsets (RWFIs), is designed to reveal more useful and meaningful weighted-frequent itemsets (WFIs) with time-sensitive consideration. To the best of our knowledge, this is the first paper to focus on the issue of mining weighted frequent itemsets with both weight and time-sensitive constraints.
- 2.
The RWFIM-P algorithm is proposed as a baseline approach to level-wisely mine the RWFIs based on the projection mechanism. Since only the sub-databases are required to be scanned of the currently processed itemset instead of the entire database, the RWFIM-P algorithm is quite efficient to discover the RWFIs.
- 3.
An efficient Estimated Weight of 2-itemset Pruning (EW2P) pruning strategy is developed in the secondly developed RWFIM-PE algorithm to efficiently derive the RWFIs without projecting the sub-databases and avoiding to produce a huge number of unpromising candidates.
- 4.
A time-decay strategy is also defined to assign the recent weight of each transaction, which can be adjusted according to the users’ specification. A transaction is set with higher decay weight if it is closed to the currently processed timestamp, which is more practical in real-world applications for mining the recent weighted-frequent itemsets.
- 5.
Experimental results also showed that the proposed two algorithms can discover the complete and correct RWFIs, and the second improved algorithm performs better than the first one, in terms of runtime, memory consumption and scalability.
The remainder of the paper is organized as follows. Related works are reviewed in Section 2. The developed knowledge of RWFIs is described in Section 3. Two proposed RWFIM-P and RWFIM-PE algorithms are respectively proposed in 4 Projected-based recent weighted-frequent itemsets mining (RWFIM-P) algorithm, 5 Projected-based with early pruning recent weighted-frequent itemsets mining (RWFIM-PE) algorithm. Experiments are conducted in Section 6. Conclusions are given in Section 7.
Section snippets
Related works
In this section, the related works of weighted-based frequent patterns mining and constraint-based itemsets mining are briefly reviewed.
Preliminaries and problem statement
In this section, the preliminaries and problem statement related to recent weighted frequent itemsets mining (RWFIM) from transactional databases are given below.
Projected-based recent weighted-frequent itemsets mining (RWFIM-P) algorithm
In this section, a Recent Weighted-Frequent Itemsets Mining Projected-based (RWFIM-P) algorithm is firstly designed to mine the RWFIs based on projection mechanism. The RWFIM-P algorithm has two main phases to firstly scan the original database for respectively calculating the relevant transactional upper-bound weight (tubw) and the recency value of each transaction. After that, the transactional accumulation upper-bound weight (taubw), the weighted support (wsup), and the recency of each
Projected-based with early pruning recent weighted-frequent itemsets mining (RWFIM-PE) algorithm
In this section, a Recent Weighted-Frequent Itemsets Mining Projected-based approach with Early pruning (RWFIM-PE) algorithm is further proposed to improve the firstly designed RWFIM-P algorithm with the developed early termination pruning strategy. Although the recent weighted-frequent upper-bound downward closure (RWFUBDC) property is efficient to reduce the search space based on the first designed RWFIM-P algorithm, many unpromising candidates are still required to be generated and
Experimental results
In this section, the performance of the proposed two RWFIM-P and RWFIM-PE algorithms for mining recent weighted-frequent itemsets (RWFIs) is evaluated in four datasets. Note that this is the first paper to consider both the weight and time-sensitive constraints for mining RWFIs. The PWA algorithm (Lan et al., 2013) for mining the weighted frequent itemsets (WFIs) is implemented as the benchmark against our designed algorithms.
All algorithms in the experiments are implemented in the Java
Conclusions
In this paper, a novel knowledge namely recent weighted-frequent itemsets (RWFIs) is proposed to solve the limitations of traditional weighted frequent itemsets mining by considering both the weight and time-sensitive constraints. Based on the developed RWFIs, more meaningful and condensed information of recent trend can be discovered by two designed RWFIM-P and RWFIM-PE algorithms. The first RWFIM-P algorithm adopts the projection mechanism to level-wisely project the sub-databases for mining
Acknowledgement
This research was partially supported by the Tencent Project under grant CCF-TencentRAGR20140114, by the Shenzhen Peacock Project, China, under grant KQC201109020055A, by the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under grant HIT.NSRIF.2014100, and by the Shenzhen Strategic Emerging Industries Program under grant ZDSY20120613125016389.
References (26)
- et al.
An effective mining approach for up-to-date patterns
Expert Syst. Appl.
(2009) - et al.
Sliding window based weighted maximal frequent pattern mining over data streams
Expert Syst. Appl.
(2014) - et al.
A new method for mining frequent weighted itemsets based on wit-trees
Expert Syst. Appl.
(2013) - Agrawal, R., Srikant, R., 1994. Fast algorithms for mining association rules in large databases. In: Proceedings of...
- Agrawal, R., Srikant, R., 1994. Quest Synthetic Data Generator. Available:...
- Agrawal, R., Srikant, R., 1995. Mining sequential patterns. In: Proceedings of International Conference on Data...
- Agrawal, R., Imielinski, T., Swami, A., 1993. Mining association rules between sets of items in large database. In:...
- Bouker, S., Saidi, R., Yahia, S.B., Nguifo, E.M., 2012. Ranking and selecting association rules based on dominance...
- Bouker, S., Saidi, R., Yahia, S.B., Nguifo, E.M., 2013. Towards a semantic and statistical selection of association...
- et al.
Mining undominated association rules through interestingness measures
Int. J. Artif. Intell. Tools
(2014)
Data mining: An overview from a database perspective
IEEE Trans. Knowl. Data Eng.
Cited by (44)
Mining periodic trends via closed high utility patterns
2023, Expert Systems with ApplicationsUsing text mining and multilevel association rules to process and analyze incident reports in China
2023, Accident Analysis and PreventionEfficient approach for incremental weighted erasable pattern mining with list structure
2020, Expert Systems with ApplicationsCitation Excerpt :The weight of pattern {b, e}, weight ({b, e}), is 0.5 that is the average of w(b) and w(e). As in the previous example, the weights of the patterns are obtained and used for weighted frequent pattern mining(Lin, Gan, Fournier-Viger, Hong, & Tseng, 2016; Lin, Gan, Fournier-Viger, & Hong, 2015). WEPMDS (Ahmed, Tanbeer, & Jeong, 2009) is an algorithm for weighted frequent pattern mining of dynamic databases based on sliding windows.
An Improved FP-Growth Algorithm with Time Decay Factor and Element Attention Weight
2024, 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications, ICPECA 2024A survey on soft computing-based high-utility itemsets mining
2022, Soft Computing