Skip to main content

Quantiles on Streams

  • Reference work entry
  • First Online:
  • 26 Accesses

Synonyms

Histogram; Median; Order statistics; Selection

Definition

Quantiles are order statistics of data: the φ-quantile (0 ≤ φ ≤ 1) of a set S is an element x such that φ|S| elements of S are less than or equal to x and the remaining (1 − φ)|S| are greater than x. This entry describes data stream (single-pass) algorithms for computing an approximation of such quantiles.

Historical Background

Since the earliest days of data processing, there has been a need to summarize data. Large volumes of raw, unstructured data easily overwhelm the human ability to comprehend or digest. Tools that help identify the major underlying trends or patterns in data have enormous value. Quantiles characterize distributions of real world data sets in ways that are less sensitive to outliers than simpler alternatives such as the mean and the variance. Consequently, quantiles are of interest to both database implementers and users: for instance, they are a fundamental tool for query optimization, splitting...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 286–96.

    Google Scholar 

  2. Blum M, Floyd R, Pratt V, Rivest R, Tarjan RE. Time bounds for selection. J Comput Syst Sci. 1973;7(4):448–61.

    Article  MathSciNet  MATH  Google Scholar 

  3. Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55(1):58–75.

    Article  MathSciNet  MATH  Google Scholar 

  4. Cormode G, Korn F, Muthukrishnan S, Srivastava D. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2006. p. 263–72.

    Google Scholar 

  5. Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.

    Google Scholar 

  6. Cormode G, Muthukrishnan S, Zhuang W. What’s different: distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 57.

    Google Scholar 

  7. Cranor C, Johnson T, Spataschek O, Shkapenyuk V. Gigascope: a stream database for network applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 647–51.

    Google Scholar 

  8. Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. J Comput Syst Sci. 1985;31(2):182–209.

    Article  MathSciNet  MATH  Google Scholar 

  9. Greenwald JM, Khanna S. Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 275–85.

    Google Scholar 

  10. Greenwald JM, Khanna S. Space-efficient online computation of quantile summaries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2001. p. 58–66.

    Google Scholar 

  11. Gupta A, Zane F. Counting inversions in streams. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms; 2003. p. 253–54.

    Google Scholar 

  12. Lin X, Lu H, Xu J, Yu JX. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In: Proceedings of the 20th International Conference on Data Engineering; 2004.p. 362–74.

    Google Scholar 

  13. Manku GS, Rajagopalan S, Lindsay BG. Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 251–62.

    Google Scholar 

  14. Manku GS, Rajagopalan S, Lindsay BG. Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 426–35.

    Google Scholar 

  15. Munro JI, Paterson MS. Selection and sorting with limited storage. Theor Comput Sci. 1980;12(3):315–23.

    Article  MathSciNet  MATH  Google Scholar 

  16. Paterson MS. Progrees in selection. In: Proceedings of the Scandinavian Workshop on Algorithm Theory; 1996. p. 368–79.

    Google Scholar 

  17. Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with sawzall. Sci Program J. 2005;13(4):227–98.

    Google Scholar 

  18. Shrivastava N, Buragohain C, Agrawal D, Suri S. Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems; 2004. p. 239–49.

    Google Scholar 

  19. Vitter JS. Random sampling with a reservoir. ACM Trans Math Softw. 1985;11(1):37–57.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiranjeeb Buragohain .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Buragohain, C., Suri, S. (2018). Quantiles on Streams. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_290

Download citation

Publish with us

Policies and ethics