skip to main content
research-article

Trading off space for passes in graph streaming problems

Published: 28 December 2009 Publication History

Abstract

Data stream processing has recently received increasing attention as a computational paradigm for dealing with massive data sets. Surprisingly, no algorithm with both sublinear space and passes is known for natural graph problems in classical read-only streaming. Motivated by technological factors of modern storage systems, some authors have recently started to investigate the computational power of less restrictive models where writing streams is allowed. In this article, we show that the use of intermediate temporary streams is powerful enough to provide effective space-passes tradeoffs for natural graph problems. In particular, for any space restriction of s bits, we show that single-source shortest paths in directed graphs with small positive integer edge weights can be solved in O((n log3/2 n)/√s) passes. The result can be generalized to deal with multiple sources within the same bounds. This is the first known streaming algorithm for shortest paths in directed graphs. For undirected connectivity, we devise an O((n log n)/s) passes algorithm. Both problems require Ω(n/s) passes under the restrictions we consider. We also show that the model where intermediate temporary streams are allowed can be strictly more powerful than classical streaming for some problems, while maintaining all of its hardness for others.

References

[1]
Abello, J., Buchsbaum, A., and Westbrook, J. 2002. A functional approach to external graph algorithms. Algorithmica 32, 3, 437--458.
[2]
Aggarwal, G., Datar, M., Rajagopalan, S., and Ruhl, M. 2004. On the streaming model augmented with a sorting primitive. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society Press, Los Alamitos, CA.
[3]
Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. System Sci. 58, 1, 137--147.
[4]
Alt, H., Geffert, V., and Mehlhorn, K. 1992. A lower bound for the nondeterministic space complexity of context free recognition. Inform. Proc. Lett. 42, 25--27.
[5]
Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS). ACM, New York, 1--16.
[6]
Bar-Yossef, Z., Kumar, R., and Sivakumar, D. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'02). ACM, New York, 623--632.
[7]
Beame, P., Jayram, T., and Rudra, A. 2007. Lower bounds for randomized read/write stream algorithms. In Proceedings of the 39th ACM Symposium on Theory of Computing (STOC'07). ACM, New York, 689--698.
[8]
Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2001. Introduction to Algorithms, Second Edition. The MIT Press, Cambridge, MA.
[9]
Demetrescu, C., Escoffier, B., Moruz, G., and Ribichini, A. 2007. Adapting parallel algorithms to the W-Stream model, with applications to graph problems. In Proceedings of the Symposium on Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 4708. Springer-Verlag, Berlin, Germany, 194--205.
[10]
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. 2004. On graph problems in a semi-streaming model. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 3142. Springer-Verlag, Berlin, Germany.
[11]
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. 2005. Graph distances in the streaming model: The value of space. In Proceedings of the 16th ACM/SIAM Symposium on Discrete Algorithms (SODA). ACM, New York, 745--754.
[12]
Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 2002. An approximate L1 difference algorithm for massive data streams. SIAM J. Comput. 32, 1, 131--151.
[13]
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M. 2002. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC). ACM, New York, 389--398.
[14]
Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M. 2001. Quicksand: Quick summary and analysis of network data. Tech. rep., DIMACS 2001-43.
[15]
Golab, L., and Ozsu, M. 2003. Data stream management issues a survey. Tech. rep., TR CS-2003-08. School of Computer Science, University of Waterloo.
[16]
Greene, D., and Knuth, D. 1982. Mathematics for the Analysis of Algorithms. Birkhäuser.
[17]
Grigni, M., and Sipser, M. 1995. Monotone separation of logarithmic space from logarithmic depth. J. Comput. Syst. Sci. 50, 433--437.
[18]
Grohe, M., Hernich, A., and Schweikardt, N. 2006. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th ACM Symposium on Principles of Database Systems (PODS'06). ACM, New York, 243--252.
[19]
Grohe, M., Koch, C., and Schweikardt, N. 2005. Tight lower bounds for query processing on streaming and external memory data. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP'05). Lecture Notes in Computer Science, vol. 3580. Springer-Verlag, Berlin, Germany. 1076--1088.
[20]
Grohe, M., and Schweikardt, N. 2005. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS'05). ACM, New York, 238--249.
[21]
Henzinger, M., and King, V. 1995. Fully dynamic biconnectivity and transitive closure. In Proceedings of the 36th IEEE Symposium on Foundations of Computer Science (FOCS'95). IEEE Computer Society Press, Los Alamitos, CA, 664--672.
[22]
Henzinger, M., Raghavan, P., and Rajagopalan, S. 1999. Computing on data streams. In External Memory Algorithms. DIMACS series in Discrete Mathematics and Theoretical Computer Science 50, 107--118.
[23]
Kushilevitz, E., and Nisan, N. 1997. Communication Complexity. Cambridge Univirsity Press, Cambridge, UK.
[24]
McGregor, A. 2005. Finding graph matchings in data streams. In Proceedings of the 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX'05). 170--181.
[25]
Munro, I., and Paterson, M. 1980. Selection and sorting with limited storage. Theoret. Comput. Sci. 12, 315--323.
[26]
Muthukrishnan, S. 2003. Data streams: Algorithms and applications. Tech. rep. http://athos.rutgers.edu/~muthu/stream-1-1.ps.
[27]
Ruhl, M. 2003. Efficient algorithms for new computational models. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA.
[28]
Sullivan, M., and Heybey, A. 1998. Tribeca: A system for managing large databases of network traffic. In Proceedings of the USENIX Annual Technical Conference.
[29]
Ullman, J., and Yannakakis, M. 1991. High-probability parallel transitive-closure algorithms. SIAM J. Comput. 20, 1, 100--125.
[30]
Vitter, J. 2001. External memory algorithms and data structures: Dealing with massive data. ACM Comput. Surv. 33, 2, 209--271.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 6, Issue 1
December 2009
374 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/1644015
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2009
Accepted: 01 May 2008
Received: 01 February 2007
Published in TALG Volume 6, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data streaming
  2. graph connectivity
  3. shortest paths

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Graph Stream Sketch: Summarizing Graph Streams With High Speed and AccuracyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.317457035:6(5901-5914)Online publication date: 1-Jun-2023
  • (2023)Efficient Semi-External SCC ComputationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313899435:4(3794-3807)Online publication date: 1-Apr-2023
  • (2023)Algorithms for Big Data Problems in de Novo Genome AssemblyAlgorithms for Big Data10.1007/978-3-031-21534-6_13(229-251)Online publication date: 18-Jan-2023
  • (2022)A One Pass Streaming Algorithm for Finding Euler ToursTheory of Computing Systems10.1007/s00224-022-10077-w67:4(671-693)Online publication date: 12-Dec-2022
  • (2021)An analysis of the graph processing landscapeJournal of Big Data10.1186/s40537-021-00443-98:1Online publication date: 9-Apr-2021
  • (2021)Position paperProceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3461837.3464514(1-12)Online publication date: 20-Jun-2021
  • (2021)Graph Connectivity in Log Steps Using Label PropagationParallel Processing Letters10.1142/S012962642150021331:04Online publication date: 7-Dec-2021
  • (2020)Substream-Centric Maximum Matchings on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/337787113:2(1-33)Online publication date: 24-Apr-2020
  • (2019)Substream-Centric Maximum Matchings on FPGAProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293916(152-161)Online publication date: 20-Feb-2019
  • (2019)A Deterministic Almost-Tight Distributed Algorithm for Approximating Single-Source Shortest PathsSIAM Journal on Computing10.1137/16M1097808(STOC16-98-STOC16-137)Online publication date: 21-Oct-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media