ABSTRACT
This paper gives an overview of recent work on machine models for processing massive amounts of data. The main focus is on generalizations of the classical data stream model where, apart from an "internal memory" of limited size, also a number of (potentially huge) streams may be used as "external memory devices".
Supplemental Material
- J. Abello and J. Vitter, editors. External Memory Algorithms, volume 50. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1999. Google ScholarDigital Library
- G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl. On the streaming model augmented with a sorting primitive. In Proc. FOCS'04, pages 540--549, 2004. Google ScholarDigital Library
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137--147, 1999. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. PODS'02, pages 1--16. Google ScholarDigital Library
- J. Balcázar, J. Díaz, and J. Gabarró. Structural Complexity I. Springer-Verlag, 2nd edition, 1995.Google ScholarCross Ref
- Z. Bar-Yossef, M. Fontoura, and V. Josifovski. Buffering in query evaluation over XML streams. In Proc. PODS'05, pages 216--227. Google ScholarDigital Library
- Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the memory requirements of XPath evaluation over XML streams. In Proc. PODS'04, pages 177--188, 2004. Google ScholarDigital Library
- P. Beame, T. Jayram, and A. Rudra. Lower bounds for randomized read/write stream algorithms. In Proc. STOC'07, 2007. Google ScholarDigital Library
- C. Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. VLDB J., 11(4):354--379, 2002. Google ScholarDigital Library
- J. Chen and C. -K. Yap. Reversal complexity. SIAM Journal on Computing, 20(4):622--638, 1991. 6Note that the lower bounds for read/write-streams presented in Section 4 do not rely on such an indivisibility assumption. Google ScholarDigital Library
- A. Czumaj and C. Sohler. Subliner-time algorithms. Bulletin of the EATCS, 89:23--47, 2006.Google Scholar
- C. Demetrescu, I. Finocchi, and A. Ribichini. Trading off space for passes in graph streaming problems. In Proc. SODA'06, pages 714--723, 2006. Google ScholarDigital Library
- Y. Diao, M. Altinel, M. Franklin, H. Zhang, and P. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM TODS, 28(4):467--516, 2003. Google ScholarDigital Library
- P. Erdös and G. Szekeres. A combinatorial problem in geometry. Compositio Mathematica, 2:463--470, 1935.Google Scholar
- G. Gottlob, C. Koch, and R. Pichler. The complexity of XPath query evaluation. In Proc. PODS' 3, pages 179--190, 2003. Google ScholarDigital Library
- T. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML streams with deterministic automata and stream indexes. ACM TODS, 29(4):752--788, 2004. Google ScholarDigital Library
- M. Grohe, Y. Gurevich, D. Leinders, N. Schweikardt, J. Tyszkiewicz, and J. Van den Bussche. Database query processing using finite cursor machines. In Proc. ICDT'07, volume 4353 of Springer LNCS, pages 284--298, 2007. Google ScholarDigital Library
- M. Grohe, A. Hernich, and N. Schweikardt. Lower bounds for processing data with few random accesses to external memory. Journal version of {22} and {19}, submitted in 2006. Google ScholarDigital Library
- M. Grohe, A. Hernich, and N. Schweikardt. Randomized computations on large data sets: Tight lower bounds. In Proc. PODS'06, pages 243--252, 2006. Full version available as CoRR Report, arXiv:cs.DB/0703081. Google ScholarDigital Library
- M. Grohe, C. Koch, and N. Schweikardt. Tight lower bounds for query processing on streaming and external memory data. Accepted at Theoretical Computer Science, special issue for selected papers from ICALP'05. Google ScholarDigital Library
- M. Grohe, C. Koch, and N. Schweikardt. The complexity of querying external memory and streaming data. In Proc. FCT'05, volume 3623 of Springer LNCS, pages 1--16, 2005. Google ScholarDigital Library
- M. Grohe and N. Schweikardt. Lower bounds for sorting with few random accesses to external memory. In Proc. PODS'05, pages 238--249, 2005. Google ScholarDigital Library
- M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External memory algorithms, volume 50, pages 107--118. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1999. Google ScholarDigital Library
- A. Hernich and N. Schweikardt. Reversal complexity revisited. CoRR Report, arXiv:cs.CC/0608036, August 2006.Google Scholar
- C. Koch. Efficient processing of expressive node-selecting queries on XML data in secondary storage: A tree automata-based approach. In Proc. VLDB'03, pages 249--260, 2003. Google ScholarDigital Library
- E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google ScholarDigital Library
- U. Meyer, P. Sanders, and J. Sibeyn, editors. Algorithms for Memory Hierarchies, volume 2625 of Springer LNCS. 2003. Google ScholarDigital Library
- S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarDigital Library
- N. Nisan and A. Wigderson. Rounds in communication complexity revisited. SIAM Journal on Computing, 22(1):211--219, 1993. Google ScholarDigital Library
- D. Olteanu. SPEX: Streamed and progressive evaluation of XPath. To appear in Trans. Know. and Data Eng. (TKDE). Google ScholarDigital Library
- C. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google Scholar
- M. Ruhl. Efficient Algorithms for New Computational Models. PhD thesis, Massachusetts Institute of Technology, 2003. Google ScholarDigital Library
- L. Segoufin and C. Sirangelo. Constant-memory validation of streaming XML documents against DTDs. In Proc. ICDT'07, volume 4353 of Springer LNCS, pages 299--313, 2007. Google ScholarDigital Library
- L. Segoufin and V. Vianu. Validating streaming XML documents. In Proc. PODS'02, pages 53--64, 2002. Google ScholarDigital Library
- J. Vitter. External memory algorithms. In Proc. PODS'98, pages 119--128, 1998. Google ScholarDigital Library
- J. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33:209--271, 2001. Google ScholarDigital Library
Index Terms
- Machine models and lower bounds for query processing
Recommendations
Tight lower bounds for query processing on streaming and external memory data
It is generally assumed that databases have to reside in external, inexpensive storage because of their sheer size. Current technology for external storage systems presents us with a reality that, performance-wise, a small number of sequential scans of ...
Lower bounds for processing data with few random accesses to external memory
We consider a scenario where we want to query a large dataset that is stored in external memory and does not fit into main memory. The most constrained resources in such a situation are the size of the main memory and the number of random accesses to ...
Lower bounds for external memory integer sorting via network coding
STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of ComputingSorting extremely large datasets is a frequently occuring task in practice. These datasets are usually much larger than the computer’s main memory; thus external memory sorting algorithms, first introduced by Aggarwal and Vitter (1988), are often used. ...
Comments