Finding Frequent Items in Time Decayed Data Streams

Wu, Shanshan; Lin, Huaizhong; U, Leong Hou; Gao, Yunjun; Lu, Dongming

doi:10.1007/978-3-319-45817-5_2

Shanshan Wu¹⁷,
Huaizhong Lin¹⁷,
Leong Hou U¹⁸,
Yunjun Gao¹⁷ &
…
Dongming Lu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9932))

Included in the following conference series:

Asia-Pacific Web Conference

1655 Accesses
1 Citations

Abstract

Identifying frequently occurring items is a basic building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over the time, we propose a new heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithms, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. Extensive experiments demonstrate the superiority of proposed algorithms in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Frequent Itemset Mining Dataset Repository http://fimi.cs.helsinki.fi/data/.

References

Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999)
Google Scholar
Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007)
Google Scholar
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003)
Google Scholar
Chen, L., Mei, Q.: Mining frequent items in data stream using time fading model. Inf. Sci. 257, 54–69 (2014)
Article MathSciNet MATH Google Scholar
Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009)
Google Scholar
Chen, L., Zou, L.J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inf. Sci. 183(1), 35–47 (2012)
Article Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)
Article Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet MATH Google Scholar
Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009)
Google Scholar
Homem, N., Carvalho, J.P.: Finding top-k elements in data streams. Inf. Sci. 180(24), 4958–4974 (2010)
Article Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. TODS 28(1), 51–55 (2003)
Article Google Scholar
Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014)
Google Scholar
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Article Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: PVLDB, pp. 346–357. VLDB Endowment (2002)
Google Scholar
Mei, Q.L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publications (2012)
Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. TODS 31(3), 1095–1133 (2006)
Article Google Scholar
Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inf. Sci. 220, 34–45 (2013)
Article Google Scholar
Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web 19(4), 1–26 (2015)
Google Scholar
Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009)
Google Scholar

Download references

Acknowledgement

This work was supported by the public key plan of Zhejiang Province (2014C23005), National Science and Technology Supporting plan (2013BAH62F02 and 2013BAH27F01), China mobile research fund of education ministry (mcm20130671), the cultural relic protection science and technology project of Zhejiang Province, University of Macau RC (MYRG2014-00106-FST), and NSFC of China (61502548).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Shanshan Wu, Huaizhong Lin, Yunjun Gao & Dongming Lu
Department of Computer and Information Science, University of Macau, Macau, China
Leong Hou U

Authors

Shanshan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huaizhong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Leong Hou U
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaizhong Lin .

Editor information

Editors and Affiliations

School of Computing, University of Utah, Salt Lake City, Utah, USA
Feifei Li
School of Electrical Engineering, Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
Soochow University , Suzhou, China
Kai Zheng
Soochow University , Suzhou, China
Guanfeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D. (2016). Finding Frequent Items in Time Decayed Data Streams. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45817-5_2
Published: 18 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics