Skip to main content
Log in

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining associations rules between sets of items in large Databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 207–216

  2. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of associations rules, advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 307–328

    Google Scholar 

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large database. In: Proceedings of the 20th international conference on very large databases (VLDB), pp 487–499

  4. Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM)

  5. Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 487–492

  6. Chang J, Lee W (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng (JISE) 20(4): 753–762

    Google Scholar 

  7. Chi Y, Wang H, Yu PS, Muntz R (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 59–66

  8. Chu CJ, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81(7): 1105–1117

    Article  Google Scholar 

  9. Chu CJ, Tseng VS, Liang T (2009) An efficient algorithm for mining high utility itemsets with negative item values in large databases. Appl Math Comput 215(2): 767–778

    Article  MATH  Google Scholar 

  10. Golab L, Ozsu MT (2003) Issues in data stream management. ACM SIGMOD Rec 32(2):5–14

    Article  Google Scholar 

  11. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12

  12. Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of the ACM 12th international conference on information and knowledge management (CIKM), pp 287–294

  13. Lee CH, Lin CR, Chen MS (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of the ACM 10th international conference on information and knowledge management (CIKM), pp 263–270

  14. Li H-F, Lee S-Y, Shan M-K (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst Int J (KAIS) 17(1):79–97

    Article  Google Scholar 

  15. Li H-F, Lee S-Y (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl (ESWA) 36(2, Part 1): 1466–1477

    Article  Google Scholar 

  16. Li H-F, Ho C-C, Lee S-Y (2009) Incremental updates of closed frequent itemsets over continuous data streams. Expert Syst Appl (ESWA) 36(2, Part 1): 2451–2458

    Article  Google Scholar 

  17. Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng (DKE) 64(1): 198–217

    Article  Google Scholar 

  18. Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the ACM international conference on utility-based data mining workshop (UBDM)

  19. Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on very large databases (VLDB), pp 346–357

  20. Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng (TKDE) 9(5): 813–825

    Article  Google Scholar 

  21. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large database. In: Proceedings of the 21th international conference on very large databases (VLDB), pp 432–444

  22. Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2009) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst Int J (KAIS). doi:10.1007/s10115-009-0269-0

  23. Yang B, Huang H (2010) TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams. Knowl Inf Syst Int J (KAIS) 23(2): 225–242

    Article  Google Scholar 

  24. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of 4th SIAM international conference on data mining (SDM)

  25. Yao H, Hamilton H, Geng L (2006) A unified framework for utility-based measures for mining itemsets. In: Proceedings of the ACM international conference on utility-based data mining workshop (UBDM), pp 28–37

  26. Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data stream in real time. In: Proceedings of the 28th international conference on very large databases (VLDB), pp 358–369

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua-Fu Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Huang, HY. & Lee, SY. Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28, 495–522 (2011). https://doi.org/10.1007/s10115-010-0330-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0330-z

Keywords

Navigation