Skip to main content
Log in

Workload-aware wavelet synopses for sliding window aggregates

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In this work, we study the problem of maintaining basic aggregate statistics over a sliding-window data stream under the constraint of limited memory. As in IoT scenarios the available memory is typically much less than the window size, queries are answered from compact synopses that are maintained in an online fashion. For the efficient construction of such synopses, we propose wavelet-based algorithms that provide deterministic guarantees and produce near exact results for a variety of data distributions. Furthermore, we show how accuracy can be further improved when workload information is known. For this purpose, we propose a workload-aware streaming system that trade-offs accuracy with synopsis’ construction throughput. The conducted experiments indicate that with only a \(15\%\) penalty in throughput, the proposed system produces fairly accurate results even for the most adversarial distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. https://www.arduino.cc/.

  2. In prefix range queries, the start (or end) of a range is always the same for all queries of the workload.

  3. https://www.arduino.cc/en/reference/SD.

  4. The size of a partition is of the form \(s=2^k-1, k>0\).

  5. https://github.com/dain/leveldb.

References

  1. Algebird (2019) Abstract algebra for scala. https://twitter.github.io/algebird/

  2. Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 234–243 (2003)

  3. Busch, C., Tirthapura, S.: A deterministic algorithm for summarizing asynchronous streams over a sliding window. In: Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science, Springer, pp. 465–476 (2007)

  4. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams: a new class of data management applications. In: Proceedings of the 28th international conference on Very Large Data Bases, VLDB Endowment, pp. 215–226 (2002)

  5. Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. Int. J. Very Large Data Bases 10(2–3), 199–223 (2001)

    Article  Google Scholar 

  6. Chan, H.L., Lam, T.W., Lee, L.K., Ting, H.F.: Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica 62(3–4), 1088–1111 (2012)

    Article  MathSciNet  Google Scholar 

  7. Cohen, E., Strauss, M.: Maintaining time-decaying stream aggregates. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 223–233 (2003)

  8. Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: Proceedings of the International Conference on Extending Database Technology, Springer, pp. 4–22 (2006)

  9. Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 9–17 (2000)

  10. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)

    Article  MathSciNet  Google Scholar 

  11. Ganguly, S., Garofalakis, M., Rastogi, R., Sabnani, K.: Streaming algorithms for robust, real-time detection of ddos attacks. In: Distributed Computing Systems, 2007. ICDCS’07. 27th International Conference on, IEEE, pp. 4–4 (2007)

  12. Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM, pp. 476–487 (2002)

  13. Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 166–176 (2004)

  14. Gibbons, P.B., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, ACM, pp. 63–72 (2002)

  15. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. Vldb 1, 79–88 (2001)

    Google Scholar 

  16. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng. 3, 541–554 (2003)

    Article  Google Scholar 

  17. Gilbert, A.C., Kotidis, I., Muthukrishnan, S., Strauss, M.J.: Method and apparatus for using wavelets to produce data summaries. US Patent 7,296,014 (2007)

  18. Guha, S.: Space efficiency in synopsis construction algorithms. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 409–420 (2005)

  19. Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, pp. 88–97 (2005)

  20. Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54(2), 811–830 (2008)

    Article  MathSciNet  Google Scholar 

  21. Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. VLDB J. Int. J. Very Large Data Bases 17(5), 1079–1099 (2008)

    Article  Google Scholar 

  22. Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 421–432 (2005)

  23. Karras, P., Mamoulis, N.: The haar+ tree: a refined synopsis data structure. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, IEEE, pp. 436–445 (2007)

  24. Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 380–389 (2007)

  25. Li, T., Li, Q., Zhu, S., Ogihara, M.: A survey on wavelet applications in data mining. ACM SIGKDD Explor. Newslett. 4(2), 49–68 (2002)

    Article  Google Scholar 

  26. Liu, K.H., Teng, W.G., Chen, M.S.: Dynamic wavelet synopses management over sliding windows in sensor networks. IEEE Trans. Knowl. Data Eng. 22(2), 193–206 (2010)

    Article  Google Scholar 

  27. Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, pp. 555–566 (2002)

  28. Matias, Y., Portman, L.: Workload-based wavelet synopses. Technical report, Department of Computer Science, Tel Aviv University, Tech. rep. (2003)

  29. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. ACM SIGMoD Rec. ACM 27, 448–459 (1998)

    Article  Google Scholar 

  30. Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: Proceedings of the International Conference on Foundations of Software Technology and Theoretical Computer Science, Springer, pp. 285–296 (2005)

  31. Mytilinis, I., Tsoumakos, D., Koziris, N.: Maintaining wavelet synopses for sliding-window aggregates. In: Proceedings of the 31st International Conference on Scientific and Statistical Database Management, ACM, pp 73–84 (2019)

  32. NOAA (2019) National oceanic and atmospheric administration. https://www1.ncdc.noaa.gov/pub/data/noaa/

  33. Papapetrou, O., Garofalakis, M., Deligiannakis, A.: Sketch-based querying of distributed sliding-window data streams. Proce. VLDB Endow. 5(10), 992–1003 (2012)

    Article  Google Scholar 

  34. Qiao, L., Agrawal, D., El Abbadi, A.: Supporting sliding window queries for continuous data streams. In: Proceedings of the Scientific and Statistical Database Management, 2003. 15th International Conference on, IEEE, pp. 85–94 (2003)

  35. Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing distributed data streams over sliding windows. Ph.D. Thesis, LINA-University of Nantes; Centre de Recherche en Économie et Statistique; Inria Rennes Bretagne Atlantique (2015)

  36. Shah, Z., Mahmood, A.N., Tari, Z., Zomaya, A.Y.: A technique for efficient query estimation over distributed data streams. IEEE Trans. Parallel Distrib. Syst. 10, 2770–2783 (2017)

    Article  Google Scholar 

  37. Stollnitz, E.J., DeRose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann, Burlington (1996)

    Google Scholar 

  38. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. Acm Sigmod Rec. ACM 28, 193–204 (1999)

    Article  Google Scholar 

  39. Xu, B., Tirthapura, S., Busch, C.: Sketching asynchronous data streams over sliding windows. Distrib. Comput. 20(5), 359–374 (2008)

    Article  Google Scholar 

  40. Yao, Y., Gehrke, J. et al: Query processing in sensor networks. In: Proceedings of the Cidr, pp. 233–244 (2003)

  41. Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time** work supported in part by us nsf grants iis-9988345 and n2010: 0115586. In: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, Elsevier, pp. 358–369 (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Mytilinis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mytilinis, I., Tsoumakos, D. & Koziris, N. Workload-aware wavelet synopses for sliding window aggregates. Distrib Parallel Databases 39, 445–482 (2021). https://doi.org/10.1007/s10619-020-07307-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-020-07307-w

Keywords

Navigation