Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors

Lin, Ming-Yen; Hsueh, Sue-Chen; Hwang, Sheng-Kun

doi:10.1007/11731139_84

Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors

Ming-Yen Lin²²,
Sue-Chen Hsueh²³ &
Sheng-Kun Hwang²²

Conference paper

3016 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

Mining frequent itemsets over data streams is an emergent research topic in recent years. Previous approaches generally use a fixed support threshold to discover the patterns in the stream. However, the threshold will be changed to cope with the needs of the users and the characteristics of the incoming data in reality. Changing the threshold implies a re-mining of the whole transactions in a non-streaming environment. Nevertheless, the "look-once" feature of the streaming data cannot provide the discarded transactions so that a re-mining on the stream is impossible. Therefore, we propose a method for variable support mining of frequent itemsets over the data stream. A synopsis vector is constructed for maintaining statistics of past transactions and is invoked only when necessary. The conducted experimental results show that our approach is efficient and scalable for variable support mining in data streams.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast Algorithm for Mining Association Rules. In: Proc. of the 20th International Conference on Very Large Databases (VLDB 1994), pp. 487–499 (1994)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in data stream systems. In: Proc. of the 2002 ACM Symposium on Principles of Database Systems (PODS 2002). ACM Press, New York (2002)
Google Scholar
Chi, Y., Wang, H.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: Perner, P. (ed.) ICDM 2004. LNCS, vol. 3275, pp. 59–66. Springer, Heidelberg (2004)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proc. of the NSF Workshop on Next Generation Data Mining (2002)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 9(2), pp. 1–12 (1999)
Google Scholar
Koyuturk, M., Grama, A., Ramakrishnan, N.: Compression, clustering and pattern discovery in very high dimensional discrete-attribute datasets. IEEE Transactions on Knowledge and Data Engineering 17(5), 447–461 (2005)
Article Google Scholar
Li, H.F., Lee, S.Y., Shan, M.K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Proc. of the First International Workshop on Knowledge Discovery in Data Streams, Pisa, Italy, September 2004, pp. 20–24 (2004)
Google Scholar
Lin, M.Y., Lee, S.Y.: Interactive Sequence Discovery by Incremental Mining. Information Sciences: An International Journal 165(3-4), 187–205 (2004)
Article MathSciNet MATH Google Scholar
Lin, M.Y., Lee, S.Y.: A Fast Lexicographic Algorithm for Association Rule Mining in Web Applications. In: Proc. of the ICDCS Workshop on Knowledge Discovery and Data Mining in the World-Wide Web, Taipei, Taiwan, R.O.C, pp. F7–F14 (2000)
Google Scholar
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th VLDB Conference, Hong Kong, China, August 2002, pp. 346–357 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering and Computer Science, Feng-Chia University, Taiwan
Ming-Yen Lin & Sheng-Kun Hwang
Department of Information Management, Chaoyang University of Technology, Taiwan
Sue-Chen Hsueh

Authors

Ming-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Sue-Chen Hsueh
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Kun Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, MY., Hsueh, SC., Hwang, SK. (2006). Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_84

Download citation

DOI: https://doi.org/10.1007/11731139_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics