Abstract
There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i,a i,j ) where i’s correspond to the domain, j’s index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream.
For example, consider the max-dominance norm, defined as ∑ i max j {a i,j }. It may be thought as estimating the norm of the “upper envelope” of the multiple signals, or alternatively, as estimating the norm of the “marginal” distribution of tabular data streams. It is used in applications to estimate the “worst case influence” of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams.
We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b — min-dominance (∑ i min j {a i,j }), count-dominance (|{i|a i > b i }|) or relative-dominance (∑ i a i /max{1,b i }) — are all impossible to estimate accurately with sublinear space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pp. 20–29 (1996); Journal version appeared in JCSS: Journal of Computer and System Sciences 58, 137–147 (1999)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisian, L.: Counting distinct elements in a data stream. In: Proceedings of RANDOM 2002, pp. 1–10 (2002)
Chambers, J.M., Mallows, C.L., Stuck, B.W.: A method for simulating stable random variables. Journal of the American Statistical Association 71(354), 340–344 (1976)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using Hamming norms. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 335–345 (2002); Journal version appeared in IEEE Transactions on Knowledge and Data Engineering (2003)
Cormode, G., Indyk, P., Koudas, N., Muthukrishnan, S.: Fast mining of tabular data via approximate distance computations. In: Proceedings of the International Conference on Data Engineering, pp. 605–616 (2002)
Cormode, G., Muthukrishnan, S.: Estimating dominance norms of multiple data streams. Technical Report 2002-35, DIMACS (2002)
Cressie, N.: A note on the behaviour of the stable distributions for small index α. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 33, 61–64 (1975)
Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of the First ACM SIGCOMM Internet Measurement Workshop (IMW 2001), pp. 75–82 (2001)
Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate L 1-difference algorithm for massive data streams. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pp. 501–511 (1999)
Feldmann, A., Greenberg, A.G., Lund, C., Reingold, N., Rexford, J., True, F.: Deriving traffic demands for operational IP networks: Methodology and experience. In: Proceedings of SIGCOMM, pp. 257–270 (2000)
Flajolet, P., Martin, G.N.: Probabilistic counting. In: 24th Annual Symposium on Foundations of Computer Science, pp. 76–82 (1985); Journal version appeared in Journal of Computer and System Sciences 31, 182–209 (1985)
Gibbons, P.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: 27th International Conference on Very Large Databases, pp. 541–550 (2001)
Gibbons, P., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of the 13th ACM Symposium on Parallel Algorithms and Architectures, pp. 281–290 (2001)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, smallspace algorithms for approximate histogram maintenance. In: Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 389–398 (2002)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 40th Symposium on Foundations of Computer Science, pp. 189–197 (2000)
Large-scale communication networks: Topology, routing, traffic, and control, http://ipam.ucla.edu/programs/cntop/cntop_schedule.html
Muthukrishnan, S.: Data streams: Algorithms and applications. In: ACM-SIAM Symposium on Discrete Algorithms (2003), http://athos.rutgers.edu/~muthu/stream-1-1.ps
Nisan, N.: Pseudorandom generators for space-bounded computation. Combinatorica 12, 449–461 (1992)
Uchaikin, V.V., Zolotarev, V.M.: Chance and Stability: Stable Distributions and their applications. In: VSP (1999)
Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: 16th International Conference on Data Engineering (ICDE 2000), pp. 13–22 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cormode, G., Muthukrishnan, S. (2003). Estimating Dominance Norms of Multiple Data Streams. In: Di Battista, G., Zwick, U. (eds) Algorithms - ESA 2003. ESA 2003. Lecture Notes in Computer Science, vol 2832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39658-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-39658-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20064-2
Online ISBN: 978-3-540-39658-1
eBook Packages: Springer Book Archive