Skip to main content

Frugal Streaming for Estimating Quantiles

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8066))

Abstract

Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny – typically, sub-streaming – amount of memory per group.

We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Swami, A.: A one-pass space-efficient algorithm for finding quantiles. In: Proc. 7th Intl. Conf. Management of Data, COMAD 1995 (1995)

    Google Scholar 

  2. Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proc. 23rd VLDB Conference, pp. 346–355 (1997)

    Google Scholar 

  3. Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 286–296. ACM, New York (2004)

    Chapter  Google Scholar 

  4. Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003, pp. 234–243. ACM, New York (2003)

    Chapter  Google Scholar 

  5. Bissias, G.D., Liberatore, M., Jensen, D., Levine, B.N.: Privacy vulnerabilities in encrypted HTTP streams. In: Danezis, G., Martin, D. (eds.) PET 2005. LNCS, vol. 3856, pp. 1–11. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006, pp. 263–272. ACM, New York (2006)

    Chapter  Google Scholar 

  7. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD, pp. 647–651 (2003)

    Google Scholar 

  9. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to summarize the universe: dynamic maintenance of quantiles. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 454–465. VLDB Endowment (2002)

    Google Scholar 

  10. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. SIGMOD Rec. 30, 58–66 (2001)

    Article  Google Scholar 

  11. Guha, S., Mcgregor, A.: Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing 38, 2044–2059 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Huang, Z., Wang, L., Yi, K., Liu, Y.: Sampling based algorithms for quantile computation in sensor networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 745–756. ACM, New York (2011)

    Google Scholar 

  13. Lin, X., Lu, H., Xu, J., Yu, J.X.: Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 362–374. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  14. Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. SIGMOD Rec. 27, 426–435 (1998)

    Article  Google Scholar 

  15. Mcgregor, A., Valiant, P.: The shifting sands algorithm. In: SODA (2012)

    Google Scholar 

  16. Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. Theoretical Computer Science 12(3), 315–323 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  17. Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, SenSys 2004, pp. 239–249. ACM, New York (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ma, Q., Muthukrishnan, S., Sandler, M. (2013). Frugal Streaming for Estimating Quantiles. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds) Space-Efficient Data Structures, Streams, and Algorithms. Lecture Notes in Computer Science, vol 8066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40273-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40273-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40272-2

  • Online ISBN: 978-3-642-40273-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics