Estimating Top-k Destinations in Data Streams

Homem, Nuno; Carvalho, Joao Paulo

doi:10.1007/978-3-642-14049-5_30

Nuno Homem²² &
Joao Paulo Carvalho²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6178))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

2008 Accesses
2 Citations

Abstract

One considers the problem of estimating the most frequent values in a data stream. In many cases an approximate answer may be enough. A novel algorithm is presented to approximate the most frequent values using a mixed approach between counter-based techniques and sketch-based ones. The algorithm is then used to find the most frequent destinations of calls by individual customers of telecommunications operators. The use of fast and small footprint algorithms is critical due to the huge number of customers to check and approximate answers are enough in most situations. The problem is that such detection needs to be performed for each individual customer and kept up to date at all times. This paper presents telecommunications customer’s behavior to justify the use of approximate algorithms. Although used in this paper on telecommunications this algorithm may well be used in other contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (1995)
MATH Google Scholar
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: TrackingMost Frequent Items Dynamically. In: Proceedings of the 22nd ACM PODS Symposium on Principles of Database Systems, pp. 296–306 (2003)
Google Scholar
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining Stream Statistics Over Sliding Windows. SIAM Journal on Computing 31(6) (2002)
Google Scholar
Demaine, E., López-Ortiz, A., Munro, J.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Proceedings of the 10th ESA Annual European Symposium on Algorithms, pp. 348–360 (2002)
Google Scholar
Dimitropoulos, X., Hurley, P., Kind, A.: Probabilistic Lossy Counting: An efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review 38(1) (January 2008)
Google Scholar
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM 2002. ACM Press, New York (2002); Also: UCSD technical report CS2002-0699 (February 2002); available electronically
Google Scholar
Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Article Google Scholar
Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. Technical Report CS2003-0738, UCSD (March 2003)
Google Scholar
Fan, L., Cao, P., Almeida, J., Broder, A.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on Networking 8(3), 281–293 (2000), doi:10.1109/90.851975
Article Google Scholar
Flajolet, P., Martin, N.: Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer and System Sciences 31(2) (October 1985)
Google Scholar
Homem, N., Carvalho, J.: Finding top-k destinations in telecommunications, Information Sciences, INS-D-09-158 (under review)
Google Scholar
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of the 28th ACM VLDB International Conference on Very Large Data Bases, pp. 346–357 (2002)
Google Scholar
Metwally, A., Agrawal, D., Abbadi, A.: Efficient Computation of Frequent and Top- k Elements in Data Streams, Technical Report 2005-23, University of California, Santa Barbara (September 2005)
Google Scholar
Misra, J., Gries, D.: Finding Repeated Elements. Science of Computer Programming 2, 143–152 (1982)
Article MATH MathSciNet Google Scholar
Whang, K., Vander-Zanden, B., Taylor, H.: A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Transactions on Database Systems 15(2) (June 1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior Técnico, INESC-ID, TULisbon, R. Alves Redol 9, 1000-029, Lisboa, Portugal
Nuno Homem & Joao Paulo Carvalho

Authors

Nuno Homem
View author publications
You can also search for this author in PubMed Google Scholar
Joao Paulo Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Mathematik und Informatik, Philipps-Universität Marburg, Hans-Meerwein-Str., 35032, Marburg, Germany
Eyke Hüllermeier
Fakultät Informatik, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse
Fakultät für Elektrotechnik und Informationstechnik, Technische Universität Dortmund, Otto-Hahn-Str. 4, 44227, Dortmund, Germany
Frank Hoffmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Homem, N., Carvalho, J.P. (2010). Estimating Top-k Destinations in Data Streams. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Computational Intelligence for Knowledge-Based Systems Design. IPMU 2010. Lecture Notes in Computer Science(), vol 6178. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14049-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-14049-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14048-8
Online ISBN: 978-3-642-14049-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics