Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Li, Hua-Fu; Huang, Hsin-Yun; Lee, Suh-Yin

doi:10.1007/s10115-010-0330-z

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Regular Paper
Published: 24 July 2010

Volume 28, pages 495–522, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hua-Fu Li¹,
Hsin-Yun Huang² &
Suh-Yin Lee²

437 Accesses
34 Citations
Explore all metrics

Abstract

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Stratified random sampling from streaming and stored data

Article 23 October 2020

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

References

Agrawal R, Imielinski T, Swami A (1993) Mining associations rules between sets of items in large Databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 207–216
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of associations rules, advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 307–328
Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large database. In: Proceedings of the 20th international conference on very large databases (VLDB), pp 487–499
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM)
Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 487–492
Chang J, Lee W (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng (JISE) 20(4): 753–762
Google Scholar
Chi Y, Wang H, Yu PS, Muntz R (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 59–66
Chu CJ, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81(7): 1105–1117
Article Google Scholar
Chu CJ, Tseng VS, Liang T (2009) An efficient algorithm for mining high utility itemsets with negative item values in large databases. Appl Math Comput 215(2): 767–778
Article MATH Google Scholar
Golab L, Ozsu MT (2003) Issues in data stream management. ACM SIGMOD Rec 32(2):5–14
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of the ACM 12th international conference on information and knowledge management (CIKM), pp 287–294
Lee CH, Lin CR, Chen MS (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of the ACM 10th international conference on information and knowledge management (CIKM), pp 263–270
Li H-F, Lee S-Y, Shan M-K (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst Int J (KAIS) 17(1):79–97
Article Google Scholar
Li H-F, Lee S-Y (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl (ESWA) 36(2, Part 1): 1466–1477
Article Google Scholar
Li H-F, Ho C-C, Lee S-Y (2009) Incremental updates of closed frequent itemsets over continuous data streams. Expert Syst Appl (ESWA) 36(2, Part 1): 2451–2458
Article Google Scholar
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng (DKE) 64(1): 198–217
Article Google Scholar
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the ACM international conference on utility-based data mining workshop (UBDM)
Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on very large databases (VLDB), pp 346–357
Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng (TKDE) 9(5): 813–825
Article Google Scholar
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large database. In: Proceedings of the 21th international conference on very large databases (VLDB), pp 432–444
Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2009) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst Int J (KAIS). doi:10.1007/s10115-009-0269-0
Yang B, Huang H (2010) TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams. Knowl Inf Syst Int J (KAIS) 23(2): 225–242
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of 4th SIAM international conference on data mining (SDM)
Yao H, Hamilton H, Geng L (2006) A unified framework for utility-based measures for mining itemsets. In: Proceedings of the ACM international conference on utility-based data mining workshop (UBDM), pp 28–37
Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data stream in real time. In: Proceedings of the 28th international conference on very large databases (VLDB), pp 358–369

Download references

Author information

Authors and Affiliations

Department of Information Management, Kainan University, Taoyuan, Taiwan
Hua-Fu Li
Department of Computer Science, National Chiao-Tung University, Hsinchu, Taiwan
Hsin-Yun Huang & Suh-Yin Lee

Authors

Hua-Fu Li
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Yun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Suh-Yin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua-Fu Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Huang, HY. & Lee, SY. Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28, 495–522 (2011). https://doi.org/10.1007/s10115-010-0330-z

Download citation

Received: 29 November 2009
Revised: 11 May 2010
Accepted: 09 July 2010
Published: 24 July 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10115-010-0330-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

A survey on the evolution of stream processing systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

A survey on the evolution of stream processing systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation