Abstract
Identifying frequently occurring items is a basic building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over the time, we propose a new heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithms, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. Extensive experiments demonstrate the superiority of proposed algorithms in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Frequent Itemset Mining Dataset Repository http://fimi.cs.helsinki.fi/data/.
References
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999)
Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003)
Chen, L., Mei, Q.: Mining frequent items in data stream using time fading model. Inf. Sci. 257, 54–69 (2014)
Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009)
Chen, L., Zou, L.J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inf. Sci. 183(1), 35–47 (2012)
Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009)
Homem, N., Carvalho, J.P.: Finding top-k elements in data streams. Inf. Sci. 180(24), 4958–4974 (2010)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. TODS 28(1), 51–55 (2003)
Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014)
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: PVLDB, pp. 346–357. VLDB Endowment (2002)
Mei, Q.L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publications (2012)
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. TODS 31(3), 1095–1133 (2006)
Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inf. Sci. 220, 34–45 (2013)
Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web 19(4), 1–26 (2015)
Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009)
Acknowledgement
This work was supported by the public key plan of Zhejiang Province (2014C23005), National Science and Technology Supporting plan (2013BAH62F02 and 2013BAH27F01), China mobile research fund of education ministry (mcm20130671), the cultural relic protection science and technology project of Zhejiang Province, University of Macau RC (MYRG2014-00106-FST), and NSFC of China (61502548).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D. (2016). Finding Frequent Items in Time Decayed Data Streams. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-45817-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)