Abstract
In the paper we present an improved version of multistage hashing based algorithm, used to find frequent items in a stream. Our algorithm uses low-pass filters instead of simple counters, so it concentrates more on recent items and ignores the old ones. Such behaviour is similar to sliding window based algorithms, but requires less memory and is suitable for real-time applications. The algorithm continuously gives estimates of frequencies of the most frequent items. It was tested with streams having various frequency distributions and proved to work correctly.
Research has been supported by grant No 3 T11C 002 29 received from Polish Ministry of Education and Science.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lan, K., Heidemann, J.: A measurement study of correlations of internet flow characteristics. Comput. Networks 50(1), 46–62 (2006)
Boyer, R.S., Moore, J.S.: MJRTY: A fast majority vote algorithm. Technical Report 35, Institute of Computer Science, Texas University (1981)
Misra, J., Gries, D.: Finding repeated elements. Technical report, Cornell University, Ithaca, NY, USA (1982)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China (August 2002)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming (2002)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Chang, J.H., Lee, W.S.: estWin: Online data stream mining of recent frequent itemsets by sliding window method. J. Inf. Sci. 31(2), 76–90 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Kołaczkowski, P.: Using low-pass signal filtering for continuous database load estimation, submitted to BDAS’07 conference, Ustroń, Poland (2007)
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: SIGMOD ’98. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 331–342. ACM Press, New York (1998)
Kantabutra, V.: On hardware for computing exponential and trigonometric functions. IEEE Trans. Comput. 45(3), 328–339 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kołaczkowski, P. (2007). Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-73451-2_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)