SPPC: a new tree structure for mining erasable patterns in data streams

Le, Tuong; Vo, Bay; Fournier-Viger, Philippe; Lee, Mi Young; Baik, Sung Wook

doi:10.1007/s10489-018-1280-5

SPPC: a new tree structure for mining erasable patterns in data streams

Published: 03 September 2018

Volume 49, pages 478–495, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tuong Le ORCID: orcid.org/0000-0003-0909-4974¹,
Bay Vo^2,3,
Philippe Fournier-Viger⁴,
Mi Young Lee¹ &
…
Sung Wook Baik¹

396 Accesses
18 Citations
Explore all metrics

Abstract

Discovering Erasable Patterns (EPs) consists of identifying product parts that will produce a small profit loss if their production is stopped. It is a data mining problem that has attracted the attention of numerous researchers in recent years due to the possibility of using EPs to reduce profit loss of manufacturers. Though, many algorithms have been designed to mine EPs, an important limitation of state-of-the-art EP mining algorithms is that they are batch algorithms, that is, they are designed to be applied on static databases. But in real-life applications, databases are dynamic, as they are constantly updated by adding or removing products and parts. To be informed about EPs in real-time, traditional EP mining algorithms must be applied over and over again on a database. This is inefficient as those algorithms are always applied from scratch without taking advantage of results generated by previous executions. Considering this important drawback of previous work for handling real-life dynamic data, this paper proposes an efficient algorithm named MSPPC for mining EPs in data streams. It relies on a novel tree structure named SPPC (Streaming Pre-Post Code) tree, which extends the WPPC tree structure for maintaining a compact tree representation of EPs in a data stream. Experimental results show that the designed MSPPC algorithm outperforms the state-of-the-art batch MERIT and dMERIT algorithms when they are run in batch mode using a sliding-window. Besides, the proposed algorithm is also faster than the state-of-the-art algorithms for mining EPs, namely MERIT, dMERIT + , MEI and EIFDD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tree-Based Unified Temporal Erasable-Itemset Mining

Erasable-Itemset Mining for Sequential Product Databases

A Dedicated Temporal Erasable-Itemset Mining Algorithm

References

Agarwal V, Bharadwaj KK (2015) Predicting the dynamics of social circles in ego networks using pattern analysis and GA K-means clustering. WIREs: Data Min Knowl Discov 5(3):113–141
Google Scholar
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD’93, pp 207–216
Alpar P, Winkelsträter S (2014) Assessment of data quality in accounting data with association rules. Exp Syst Appl 41(5):2259–2268
Article Google Scholar
Chang J, Lee W (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inf Syst 31(8):849–869
Article Google Scholar
Chang J, Lee W (2009) estMax: tracing maximal frequent itemsets instantly over online transactional data streams. IEEE Trans Knowl Data Eng 21(10):1418–1431
Article Google Scholar
Chen H (2014) Mining top-k frequent patterns over data streams sliding window. J Intell Inf Syst 42(1):111–131
Article Google Scholar
Chen H, Shu L, Xia J, Deng Q (2012) Mining frequent patterns in a varying-size sliding-window of online transactional data streams. Inf Sci 215:15–36
Article MathSciNet Google Scholar
Chiu S-C, Li H-F, Huang J-L, You H-H (2011) Incremental mining of closed inter-transaction itemsets over data stream sliding windows. J Inf Sci 37(2):208–220
Article Google Scholar
Dakhel AM, Malazi HT, Mahdavi M (2018) A social recommender system using item asymmetric correlation. Appl Intell 48(3):527–540
Article Google Scholar
Deng ZH (2013) Mining top-rank-k erasable itemsets by PID_lists. Int J Intell Syst 28(4):366–379
Article Google Scholar
Deng ZH (2016) DiffNodesets: an efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223
Article Google Scholar
Deng ZH, Xu XR (2012) Fast mining erasable itemsets using NC_sets. Exp Syst Appl 39(4):4453–4463
Article Google Scholar
Deng ZH, Fang G, Wang Z, Xu X (2009) Mining erasable itemsets. In: ICMLC’09, pp 67–73
Deypir M, Sadreddini MH (2011) EclatDS: an efficient sliding-window based frequent pattern mining method for data streams. Intell Data Anal 15(4):571–587
Article Google Scholar
Deypir M, Sadreddini MH, Tarahomi M (2013) An efficient sliding-window based algorithm for adaptive frequent itemset mining over data streams. J Inf Sci Eng 29(5):1001–1020
MathSciNet Google Scholar
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. WIREs Data Min Knowl Discov 7(4):e1207
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD’00, pp 1–12
Khader N, Lashier A, Yoon SW (2016) Pharmacy robotic dispensing and planogram analysis using association rule mining with prescription data. Exp Syst Appl 57:296–310
Article Google Scholar
Le T, Vo B (2014) MEI: an efficient algorithm for mining erasable itemsets. Eng Appl Artif Intell 27:155–166
Article Google Scholar
Le T, Vo B, Coenen F (2013) An efficient algorithm for mining erasable itemsets using the difference of NC-Sets. In: SMC’13, pp 2270–2274
Le T, Vo B, Nguyen G (2014) A survey of erasable itemset mining algorithms. WIREs: Data Min Knowl Discov 4(5):356– 379
Google Scholar
Le T, Lee MY, Park JR, Baik SW (2018) Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10(4):79
Article Google Scholar
Le HS, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM-AMO: an efficient association rule mining algorithm based on animal migration optimization. Knowl-Based Syst 154:68–80
Article Google Scholar
Le T, Vo B, Baik SW (2018) Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng Appl Artif Intell 68:1–9
Article Google Scholar
Le T, Nguyen A, Huynh B, Vo B, Pedrycz W (2018) Mining constrained inter-sequence patterns: a novel approach to cope with item constraints. Appl Intell 48(5):1327–1343
Google Scholar
Lee G, Yun U, Ryu K (2014) Sliding-window based weighted maximal frequent pattern mining over data streams. Exp Syst Appl 41(2):694–708
Article Google Scholar
Lee G, Yun U, Ryang H (2015) Mining weighted erasable patterns by using underestimated constraint-based pruning technique. J Intell Fuzzy Syst 28(3):1145–1157
Google Scholar
Lee G, Yun U, Ryang H, Kim D (2016) Erasable itemset mining over incremental databases with weight conditions. Eng Appl Artif Intell 52:213–234
Article Google Scholar
Lin CW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250
Article Google Scholar
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB’02, pp 346–357
Nanda SJ, Panda G (2015) Design of computationally efficient density-based clustering algorithms. Data Knowl Eng 95:23–38
Article Google Scholar
Nguyen G, Le T, Vo B, Le B (2014) A new approach for mining top-rank-k erasable itemsets. In: ACIIDS’14, pp 73–82
Nguyen G, Le T, Vo B, Le B (2015) Discovering erasable closed patterns. In: ACIIDS’15, pp 368–376
Nguyen G, Le T, Vo B, Le B (2015) EIFDD: an efficient approach for erasable itemset mining of very dense datasets. Appl Intell 43(1):85–94
Article Google Scholar
Nori F, Deypir M, Sadreddini MH (2013) A sliding-window based algorithm for frequent closed itemset mining over data streams. J Syst Softw 86(3):615–623
Article Google Scholar
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding-window technique. Expert Syst Appl 57:214–231
Article Google Scholar
Sahoo J, Das AK, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Exp Syst Appl 42(13):5754–5778
Article Google Scholar
Tsai PSM (2010) Mining top-k frequent closed itemsets over data streams using the sliding-window model. Exp Syst Appl 37(10):6968–6973
Article Google Scholar
Vo B, Le T, Coenen F, Hong TP (2016) Mining frequent itemsets using the N-list and subsume concepts. Int J Mach Learn Cybern 7(2):253–265
Article Google Scholar
Vo B, Le T, Nguyen G, Hong TP (2017) Efficient algorithms for mining erasable closed patterns from product datasets. IEEE Access 5:3111–3120
Article Google Scholar
Wang J, Li H, Huang J, Su C (2016) Association rules mining based analysis of consequential alarm sequences in chemical processes. J Loss Prev Process Ind 41:178–185
Article Google Scholar
Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015
Article Google Scholar
Yun U, Lee G (2016) Sliding-window based weighted erasable stream pattern mining for stream data applications. Futur Gener Comput Syst 59:1–20
Article Google Scholar
Yun U, Kim D, Ryang H, Lee G, Lee KM (2016) Mining recent high average utility patterns based on sliding-window from stream data. J Intell Fuzzy Syst 30(6):3605–3617
Article Google Scholar
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206
Article Google Scholar
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Article Google Scholar
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Article Google Scholar
Zhi-Jun X, Hong C, Li C (2006) An efficient algorithm for frequent itemset mining on data streams. In: ICDM’06, pp 474–491

Download references

Acknowledgements

This research was supported by the Korean MSIT (Ministry of Science and ICT), under the National Program for Excellence in SW (2015-0-00938), supervised by the IITP (Institute for Information & communications Technology Promotion).

Author information

Authors and Affiliations

Digital Contents Research Institute, Sejong University, Seoul, Republic of Korea
Tuong Le, Mi Young Lee & Sung Wook Baik
Division of Data Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Bay Vo
Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Bay Vo
School of Natural Sciences and Humanities, Harbin Institute of Technology (Shenzhen), Shenzhen, GD, China
Philippe Fournier-Viger

Authors

Tuong Le
View author publications
You can also search for this author in PubMed Google Scholar
Bay Vo
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Mi Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sung Wook Baik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung Wook Baik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, T., Vo, B., Fournier-Viger, P. et al. SPPC: a new tree structure for mining erasable patterns in data streams. Appl Intell 49, 478–495 (2019). https://doi.org/10.1007/s10489-018-1280-5

Download citation

Published: 03 September 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10489-018-1280-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPPC: a new tree structure for mining erasable patterns in data streams

Abstract

Access this article

Similar content being viewed by others

Tree-Based Unified Temporal Erasable-Itemset Mining

Erasable-Itemset Mining for Sequential Product Databases

A Dedicated Temporal Erasable-Itemset Mining Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SPPC: a new tree structure for mining erasable patterns in data streams

Abstract

Access this article

Similar content being viewed by others

Tree-Based Unified Temporal Erasable-Itemset Mining

Erasable-Itemset Mining for Sequential Product Databases

A Dedicated Temporal Erasable-Itemset Mining Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation