Mining top-k high-utility itemsets from a data stream under sliding window model

Dawar, Siddharth; Sharma, Veronica; Goyal, Vikram

doi:10.1007/s10489-017-0939-7

Mining top-k high-utility itemsets from a data stream under sliding window model

Published: 08 June 2017

Volume 47, pages 1240–1255, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Siddharth Dawar¹,
Veronica Sharma¹ &
Vikram Goyal¹

777 Accesses
30 Citations
Explore all metrics

Abstract

High-utility itemset mining has gained significant attention in the past few years. It aims to find sets of items i.e. itemsets from a database with utility no less than a user defined threshold. The notion of utility provides more flexibility to an analyst to mine relevant itemsets. Nowadays, a continuous and unbounded stream of data is generated from web-clicks, transaction flow from retail stores, sensor networks, etc. Mining high-utility itemsets from a data stream is a challenging task as the incoming stream of data has to be processed on the fly with time and storage memory constraints. The number of high-utility itemsets depends on the user-defined threshold. A large number of itemsets can be generated at very low threshold values and vice versa. It can be a tedious task to set a threshold value to get a reasonable number of itemsets. Top-k high-utility itemset mining was coined to address this issue. k is the number of high-utility itemsets in the result set as defined by the user. In this paper, we propose a data structure and an efficient algorithm for mining top-k high-utility itemsets from a data stream. The algorithm has a single phase that does not generate any candidates, unlike many algorithms that work in two phases, i.e., candidate generation followed by candidates verification. We conduct extensive experiments on several real and synthetic datasets. Experimental results demonstrate that our proposed algorithm performs 20 to 80 times better on sparse datasets and 300 to 700 times on dense datasets than the state-of-the-art algorithm in terms of computation time. Furthermore, our proposed algorithm requires less memory compared to the state-of-the-art algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Active learning for data streams: a survey

Article Open access 20 November 2023

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

References

Aggarwal CC (2013) Managing and mining sensor data. Springer Science & Business Media
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499
Ahmed C F, Tanbeer S K, Jeong B S, Lee Y K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. doi:10.1109/TKDE.2009.46
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS, Choi HJ (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11,979–11,991. doi:10.1016/j.eswa.2012.03.062. http://www.sciencedirect.com/science/article/pii/S0957417412005854
Article Google Scholar
Bansal R, Dawar S, Goyal V (2015) An efficient algorithm for mining high-utility itemsets with discount notion, Springer International Publishing, pp 84–98. doi:10.1007/978-3-319-27057-9_6
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets Third IEEE international conference on data mining. doi:10.1109/ICDM.2003.1250893, pp 19–26
Chapter Google Scholar
Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’03. doi:10.1145/956750.956807, pp 487–492
Chapter Google Scholar
Chi Y, Wang H, Yu PS, Muntz RR (2004) Moment: maintaining closed frequent itemsets over a stream sliding window Fourth IEEE international conference on data mining, 2004. ICDM ’04. doi:10.1109/ICDM.2004.10084, pp 59–66
Google Scholar
Dawar S, Goyal V (2014) Up-hist tree: an efficient data structure for mining high utility patterns from transaction databases Proceedings of the 19th international database engineering & applications symposium, ACM, New York, NY, USA, IDEAS ’15. doi:10.1145/2790755.2790771, pp 56–61
Chapter Google Scholar
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster High-utility itemset mining using estimated utility co-occurrence pruning, Springer International Publishing, pp 83–92. doi:10.1007/978-3-319-08326-1_9
Goethals B, Zaki M (2012) The fimi repository
Goyal V, Dawar S, Sureka A (2015) High utility rare itemset mining over transaction databases, Springer International Publishing, pp 27–40. doi:10.1007/978-3-319-16313-0_3
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation Proceedings of the 2000 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’00, pp 1–12, DOI doi:10.1145/342009.335372, (to appear in print)
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42 (5):2371–2381. doi:10.1016/j.eswa.2014.11.001. http://www.sciencedirect.com/science/article/pii/S0957417414006848
Article Google Scholar
Leung CKS, Jiang F (2011) Frequent itemset mining of uncertain data streams using the damped window model Proceedings of the 2011 ACM symposium on applied computing, ACM, New York, NY, USA, SAC ’11. doi:10.1145/1982185.1982393, pp 950–955
Chapter Google Scholar
Li HF, Lee SY (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36(2, Part 1):1466–1477. doi:10.1016/j.eswa.2007.11.061. http://www.sciencedirect.com/science/article/pii/S0957417407006057
Article Google Scholar
Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams 2008 eighth IEEE international conference on data mining. doi:10.1109/ICDM.2008.107, pp 881–886
Chapter Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation Proceedings of the 21st ACM international conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’12. doi:10.1145/2396761.2396773, pp 55–64
Google Scholar
Liu Y, Liao Wk, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. Springer, Berlin, Heidelberg, pp 689–695. doi:10.1007/11430919_79
Google Scholar
Pei J, Han J, Mao R et al (2000) Closet: an efficient algorithm for mining frequent closed itemsets ACM SIGMOD workshop on research issues in data mining and knowledge discovery, vol 4, pp 21–30
Pisharath J, Liu Y, Wk Liao, Choudhary A, Memik G, Parhi J (2005) Nu-minebench 2.0. Department of Electrical and Computer Engineering, Northwestern University, Tech Rep
Rathore S, Dawar S, Goyal V, Patel D (2016) Top-k high utility episode mining from a complex event sequence Proceedings of the 21st international conference on management of data, computer society of India
Google Scholar
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231. doi:10.1016/j.eswa.2016.03.001. http://www.sciencedirect.com/science/article/pii/S0957417416300902
Article Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523. doi:10.1016/0306-4573(88)90021-0. http://www.sciencedirect.com/science/article/pii/0306457388900210
Article Google Scholar
Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12,947–12,960. doi:10.1016/j.eswa.2012.05.035. http://www.sciencedirect.com/science/article/pii/S095741741200749X
Article Google Scholar
Tseng VS, Chu CJ, Liang T (2006) Efficient mining of temporal high utility itemsets from data streams Second international workshop on utility-based data mining, Citeseer, vol 18
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’10. doi:10.1145/1835804.1835839, pp 253–262
Chapter Google Scholar
Tseng V S, Shie B E, Wu C W, Yu P S (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. doi:10.1109/TKDE.2012.59
Article Google Scholar
Tseng V S, Wu C W, Fournier-Viger P, Yu P S (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67. doi:10.1109/TKDE.2015.2458860
Article Google Scholar
Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’12. doi:10.1145/2339530.2339546, pp 78–86
Google Scholar
Wu CW, Lin YF, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’13. doi:10.1145/2487575.2487654, pp 536–544
Chapter Google Scholar
Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242. doi:10.1007/s10115-009-0211-5
Article Google Scholar
Yen SJ, Lee YS, Wu CW, Lin CL (2009) An efficient algorithm for maintaining frequent closed itemsets over data stream. Springer, Berlin, Heidelberg, pp 767–776. doi:10.1007/978-3-642-02568-6_78
Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’12. doi:10.1145/2339530.2339636, pp 660–668
Google Scholar
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns 2013 IEEE 13th international conference on data mining. doi:10.1109/ICDM.2013.148, pp 1259–1264
Chapter Google Scholar
Zaki M J, Parthasarathy S, Ogihara M, Li W et al (1997) New algorithms for fast discovery of association rules KDD, vol 97, pp 283–286
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161. doi:10.1016/j.ins.2014.01.045. http://www.sciencedirect.com/science/article/pii/S0020025514000814
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported in parts by Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), and Visvesvaraya Ph.D scheme for Electronics and IT.

Author information

Authors and Affiliations

Department of Computer Science, Indraprastha Institute of Information Technology, Delhi, India
Siddharth Dawar, Veronica Sharma & Vikram Goyal

Authors

Siddharth Dawar
View author publications
You can also search for this author in PubMed Google Scholar
Veronica Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Vikram Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram Goyal.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dawar, S., Sharma, V. & Goyal, V. Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47, 1240–1255 (2017). https://doi.org/10.1007/s10489-017-0939-7

Download citation

Published: 08 June 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10489-017-0939-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining top-k high-utility itemsets from a data stream under sliding window model

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Active learning for data streams: a survey

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining top-k high-utility itemsets from a data stream under sliding window model

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Active learning for data streams: a survey

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation