Skip to main content
Log in

Testing and Spot-Checking of Data Streams

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We consider the tasks of testing and spot-checking for data streams . These testers and spot-checkers are potentially useful in real-time or near real-time applications that process huge data sets. Crucial aspects of the computational model include the space complexity of the testers and spot-checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot-checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass).

A sampling-tester [GGR] for a property P samples some (but usually not all) of its input and, with high probability, outputs PASS if the input has property P and FAIL if the input is far {from} having P , for an appropriate sense of ``far.'' A streaming-tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far {from} having P . A sampling-tester can make its samples in any order; a streaming-tester sees the input from left to right.

We consider the groupedness property (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [EKK+] in the context of sampling spot-checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED-SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming-tester but not by a sampling-tester and other (promise) problems for which the reverse is true.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feigenbaum, Kannan, Strauss et al. Testing and Spot-Checking of Data Streams . Algorithmica 34, 67–80 (2002). https://doi.org/10.1007/s00453-002-0959-4

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-002-0959-4

Navigation