Skip to main content
Log in

On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Synopses construction algorithms have been found to be of interest in query optimization, approximate query answering and mining, and over the last few years several good synopsis construction algorithms have been proposed. These algorithms have mostly focused on the running time of the synopsis construction vis-a-vis the synopsis quality. However the space complexity of synopsis construction algorithms has not been investigated as thoroughly. Many of the optimum synopsis construction algorithms are expensive in space. For some of these algorithms the space required to construct the synopsis is significantly larger than the space required to store the input. These algorithms rely on the fact that they require a smaller “working space” and most of the data can be resident on disc. The large space complexity of synopsis construction algorithms is a handicap in several scenarios. In the case of streaming algorithms, space is a fundamental constraint. In case of offline optimal or approximate algorithms, a better space complexity often makes these algorithms much more attractive by allowing them to run in main memory and not use disc, or alternately allows us to scale to significantly larger problems without running out of space. In this paper, we propose a simple and general technique that reduces space complexity of synopsis construction algorithms. As a consequence we show that the notion of “working space” proposed in these contexts is redundant. This technique can be easily applied to many existing algorithms for synopsis construction problems. We demonstrate the performance benefits of our proposal through experiments on real-life and synthetic data. We believe that our algorithm also generalizes to a broader range of dynamic programs beyond synopsis construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: Proc. of ACM SIGMOD (1999)

  2. Aggarwal, A., Alpern, B., Chandra, A., Snir, M.: A model for hierarchical memory. In: Proceedings of the Symposium on Theory of Computing (STOC), pp. 305–314 (1987)

  3. Amsaleg L., Bonnet P., Franklin M.J., Tomasic A. and Urhan T. (1997). Improving responsiveness for wide-area data access. IEEE Data Eng. 20(3): 3–11

    Google Scholar 

  4. Arge, L.: External memory data structures. In: Proc. of ESA, pp. 1–29 (2001)

  5. Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Databases (VLDB) (2000)

  6. Chakrabarti K., Keogh E.J., Mehrotra S. and Pazzani M.J. (2002). Locally adaptive dimensionality reduction for indexing large time series databases. ACM TODS 27(2): 188–228

    Article  Google Scholar 

  7. Chou, H.T., DeWitt, D.J.: An evaluation of buffer management strategies for relational database systems. pp. 127–141 (1985)

  8. Daubechies, I.: Ten lectures on wavelets. SIAM (1992)

  9. Deligiannakis, A., Roussopoulos, N.: Extended wavelets for multiple measures. In: SIGMOD Conference (2003)

  10. Denning P.J. (1968). The working set model for program behaviour. CACM 11(5): 323–333

    Google Scholar 

  11. Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum error metric. In: Proc. of PODS (2004)

  12. Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proc. of ACM SIGMOD (2002)

  13. Garofalakis M.N. and Gibbons P.B. (2004). Probabilistic wavelet synopses. ACM TODS 29: 43–90

    Article  Google Scholar 

  14. Gibbons, P., Matias, Y.: Synopsis data structures for massive data sets. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms pp. S909–S910 (1999)

  15. Gibbons, P., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. In: Proc. of VLDB, Athens, pp. 466–475 (1997)

  16. Gilbert, A., Kotadis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: one pass summaries for approximate aggregate queries. In: Proceedings of VLDB, pp. 79–88 (2001)

  17. Gilbert, A.C., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proc. of ACM STOC (2002)

  18. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Optimal and approximate computation of summary statistics for range aggregates. In: Proc. of ACM PODS (2001)

  19. Guha, S.: Space efficiency in synopsis construction algorithms. Proceedings of the International Conference on Very Large Databases (VLDB) (2005)

  20. Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proc. of SIGKDD Conference (2005)

  21. Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA) (2006)

  22. Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Histogramming data streams with fast per-item processing. In: Proc. of ICALP (2002)

  23. Guha, S., Kim, C., Shim, K.: XWAVE: Optimal and approximate extended wavelets for streaming data. In: Proceedings of the International Conference on Very Large Databases (VLDB) (2004)

  24. Guha, S., Kim, C., Shim, K.: XWAVE: Optimal and approximate extended wavelets for streaming data. Extended version of [24] (2005)

  25. Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: algorithms and performance evaluation. In: Proc. of ICDE (2002)

  26. Guha, S., Koudas, N., Shim, K.: Data streams and histograms. In: Proc. of STOC (2001)

  27. Guha, S., Koudas, N., Shim, K.: Approximation algorithms for histogram construction problems. To appear in TODS. This is the full version of [27], available at http://www.cis.upenn.edu/sudipto/mypapers/histjour.pdf.gz (2005)

  28. Guha, S., Koudas, N., Srivastava, D.: Fast algorithms for hierarchical range histogram construction. In: Proc. of ACM PODS (2002)

  29. Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. Manuscript. Email for copy (2005)

  30. Guha, S., Shim, K., Woo, J.: REHIST: Relative error histogram construction algorithms. In: Proceedings of the International Conference on Very Large Databases (VLDB) (2004)

  31. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD Conference (1997)

  32. Hirschberg D.S. (1975). A linear space algorithm for computing maximal common subsequences. CACM 18(6): 341–343

    MATH  MathSciNet  Google Scholar 

  33. Hochbaum, D. (ed.): Approximation Algorithms for NP Hard Problems. Brooks/Cole (1996)

  34. Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. of ACM SIGMOD (1995)

  35. Ioannidis, Y.E.: Universality of serial histograms. In: Proc. of the VLDB Conference. (1993)

  36. Ioannidis, Y.E.: The history of histograms (abridged). In: Proc. of VLDB Conference. pp. 19–30 (2003)

  37. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal Histograms with Quality Guarantees. In: Proc. of the VLDB Conference (1998)

  38. Jagadish, H.V., Lakshmanan, V.S., Srivastava, D.: What can hierarchies do for data warehouse? In: Proceedings of the International Conference on Very Large Databases (VLDB), (1999)

  39. Karras, P., Mamoulis, N.: One pass wavelet synopis for maximum error metrics. In: Proceedings of the International Conference on Very Large Databases (VLDB) (2005)

  40. Kooi, R.P.: The optimization of queries in relational databases. PhD Thesis, Case Western Reserve University (1980)

  41. Koudas, N., Muthukrishnan, S., Srivastava, D.: Optimal histograms for hierarchical range queries. In: Proc. of ACM PODS (2000)

  42. Matias, Y., Urieli, D.: Personal communication (2004)

  43. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: Proc. of ACM SIGMOD (1998)

  44. Muralikrishna, M., DeWitt, D.J.: Equi-depth histograms for estimating selectivity factors for multidimensional queries. In: Proc. of ACM SIGMOD, Chicago, pp. 28–36 (1998)

  45. Muthukrishnan, S.: Workload optimal wavelet synopsis. DIMACS TR (2004)

  46. Muthukrishnan, S., Strauss, M.: Approximate histogram and wavelet summaries of streaming data. DIMACS TR 52 (2003)

  47. Muthukrishnan, S., Strauss, M.: Rangesum histograms. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA) (2003)

  48. Poosala, V., Ioannidis, Y., Haas, P., Shekita, E.: Improved histograms for selectivity estimation of range predicates. In: Proc. of ACM SIGMOD, Montreal pp. 294–305 (1996)

  49. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proc. of the ACM SIGMOD, pp. 23–34 (1979)

  50. Smith A.J. (1982). Cache memories. ACM Comput. Surv. 14(3): 473–530

    Article  Google Scholar 

  51. Vazirani V. (2001). Approximation Algorithms. Springer, Heidelberg

    Google Scholar 

  52. Vitter J.S. (2001). External memory algorithms and data structures: dealing with massive data. ACM Comput. Surv. 33(2): 209–271

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudipto Guha.

Additional information

Sudipto Guha’s research supported in part by an Alfred P. Sloan Research Fellowship and by NSF Awards CCF-0430376, CCF-0644119.A preliminary version of the paper appeared as “Space efficiency in synopsis construction algorithms”, VLDB Conference 2005, Trondheim, [19].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guha, S. On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems. The VLDB Journal 17, 1509–1535 (2008). https://doi.org/10.1007/s00778-007-0083-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0083-9

Keywords

Navigation