Abstract
The central goal of data stream algorithms is to process massive streams of data using sublinear storage space. Motivated by work in the database community on outsourcing database and data stream processing, we ask whether the space usage of such algorithms be further reduced by enlisting a more powerful “helper” who can annotate the stream as it is read. We do not wish to blindly trust the helper, so we require that the algorithm be convinced of having computed a correct answer. We show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it. We also prove lower bounds on such tradeoffs, often nearly matching the upper bounds, via notions related to Merlin-Arthur communication complexity. Our results cover the classic data stream problems of selection, frequency moments, and fundamental graph problems such as triangle-freeness and connectivity. Our work is also part of a growing trend — including recent studies of multi-pass streaming, read/write streams and randomly ordered streams — of asking more complexity-theoretic questions about data stream processing. It is a recognition that, in addition to practical relevance, the data stream model raises many interesting theoretical questions in its own right.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aaronson, S., Wigderson, A.: Algebrization: a new barrier in complexity theory. In: ACM STOC (2008)
Ablayev, F.: Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science 175(2), 139–159 (1996)
Aggarwal, G., Datar, M., Rajagopalan, S., Ruhl, M.: On the streaming model augmented with a sorting primitive. In: IEEE FOCS (2004)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Babai, L., Frankl, P., Simon, J.: Complexity classes in communication complexity theory (preliminary version). In: IEEE FOCS (1986)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: ACM-SIAM SODA (2002)
Beame, P., Huynh-Ngoc, D.-T.: On the value of multiple read/write streams for approximating frequency moments. In: IEEE FOCS (2008)
Beame, P., Jayram, T.S., Rudra, A.: Lower bounds for randomized read/write stream algorithms. In: ACM STOC (2007)
Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: ACM PODS (2006)
Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. In: ACM STOC (2008)
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE CCC (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Demetrescu, C., Escoffier, B., Moruz, G., Ribichini, A.: Adapting parallel algorithms to the W-stream model, with applications to graph problems. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 194–205. Springer, Heidelberg (2007)
Demetrescu, C., Finocchi, I., Ribichini, A.: Trading off space for passes in graph streaming problems. In: ACM-SIAM SODA (2006)
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theoretical Computer Science 348(2-3), 207–216 (2005)
Feigenbaum, J., Kannan, S., Zhang, J.: Annotation and computational geometry in the streaming model. Technical Report YALEU/DCS/TR-1249, Yale University (2003)
Freivalds, R.: Fast probabilistic algorithms. In: Becvar, J. (ed.) MFCS 1979. LNCS, vol. 74, Springer, Heidelberg (1979)
Gertner, Y., Kannan, S., Viswanathan, M.: NP and streaming verifiers (manuscript, 2002)
Grohe, M., Hernich, A., Schweikardt, N.: Randomized computations on large data sets: tight lower bounds. In: ACM PODS (2006)
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: External memory algorithms (1999)
Johnson, W., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)
Jowhari, H., Ghodsi, M.: New streaming algorithms for counting triangles in graphs. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 710–716. Springer, Heidelberg (2005)
Kimbrel, T., Sinha, R.K.: A probabilistic algorithm for verifying matrix products using o(n 2) time and log2 n + o(1) random bits. Inf. Process. Lett. 45(2), 107–110 (1993)
Klauck, H.: Rectangle size bounds and threshold covers in communication complexity. In: IEEE CCC (2003)
Kushilevitz, E., Nisan, N.: Communication Complexity. CUP (1997)
Li, F., Yi, K., Hadjieleftheriou, M., Kollios, G.: Proof-infused streams: Enabling authentication of sliding window queries on streams. In: VLDB (2007)
Lund, C., Fortnow, L., Karloff, H., Nisan, N.: Algebraic methods for interactive proof systems. J. ACM 39(4), 859–868 (1992)
Papadopoulos, S., Yang, Y., Papadias, D.: Cads: Continuous authentication on data streams. In: VLDB (2007)
Razborov, A.: On the distributional complexity of disjontness. In: Paterson, M. (ed.) ICALP 1990. LNCS, vol. 443, Springer, Heidelberg (1990)
Shamir, A.: IP = PSPACE. J. ACM 39(4), 869–877 (1992)
Thorup, M., Zhang, Y.: Tabulation based 4-universal hashing with applications to second moment estimation. In: ACM-SIAM SODA (2004)
Tucker, P.A., Maier, D., Delcambre, L.M.L., Sheard, T., Widom, J., Jones, M.P.: Punctuated data streams (2005)
Yi, K., Li, F., Hadjieleftheriou, M., Kollios, G., Srivastava, D.: Randomized synopses for query assurance on data streams. In: IEEE ICDE (2008)
Zelke, M.: Weighted matching in the semi-streaming model. In: STACS, pp. 669–680 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakrabarti, A., Cormode, G., McGregor, A. (2009). Annotations in Data Streams. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02927-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-02927-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02926-4
Online ISBN: 978-3-642-02927-1
eBook Packages: Computer ScienceComputer Science (R0)