Maintaining Frequent Itemsets over High-Speed Data Streams

Cheng, James; Ke, Yiping; Ng, Wilfred

doi:10.1007/11731139_53

James Cheng²²,
Yiping Ke²² &
Wilfred Ng²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3049 Accesses
14 Citations

Abstract

We propose a false-negative approach to approximate the set of frequent itemsets (FIs) over a sliding window. Existing approximate algorithms use an error parameter, ε, to control the accuracy of the mining result. However, the use of ε leads to a dilemma. A smaller ε gives a more accurate mining result but higher computational complexity, while increasing ε degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.

This work is partially supported by RGC CERG under grant number HKUST6185/02E and HKUST6185/03E.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chang, J.H., Lee, W.S.: estWin: Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In: Proc. of CIKM (2003)
Google Scholar
Chang, J.H., Lee, W.S.: A Sliding Window method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering 20(4) (July 2004)
Google Scholar
Cheng, J., Ke, Y., Ng, W.: Maintaining Frequent Itemsets over High-Speed Data Streams. Technical Report, http://www.cs.ust.hk/~csjames/pakdd06tr.pdf
Li, H., Lee, S., Shan, M.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Proc. of First International Workshop on Knowledge Discovery in Data Streams (2004)
Google Scholar
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of VLDB (2002)
Google Scholar
Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: VLDB (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
James Cheng, Yiping Ke & Wilfred Ng

Authors

James Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Ke
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Ng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, J., Ke, Y., Ng, W. (2006). Maintaining Frequent Itemsets over High-Speed Data Streams. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_53

Download citation

DOI: https://doi.org/10.1007/11731139_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics