Abstract
In this work, we study the problem of maintaining basic aggregate statistics over a sliding-window data stream under the constraint of limited memory. As in IoT scenarios the available memory is typically much less than the window size, queries are answered from compact synopses that are maintained in an online fashion. For the efficient construction of such synopses, we propose wavelet-based algorithms that provide deterministic guarantees and produce near exact results for a variety of data distributions. Furthermore, we show how accuracy can be further improved when workload information is known. For this purpose, we propose a workload-aware streaming system that trade-offs accuracy with synopsis’ construction throughput. The conducted experiments indicate that with only a \(15\%\) penalty in throughput, the proposed system produces fairly accurate results even for the most adversarial distributions.
Similar content being viewed by others
Notes
In prefix range queries, the start (or end) of a range is always the same for all queries of the workload.
The size of a partition is of the form \(s=2^k-1, k>0\).
References
Algebird (2019) Abstract algebra for scala. https://twitter.github.io/algebird/
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 234–243 (2003)
Busch, C., Tirthapura, S.: A deterministic algorithm for summarizing asynchronous streams over a sliding window. In: Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science, Springer, pp. 465–476 (2007)
Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams: a new class of data management applications. In: Proceedings of the 28th international conference on Very Large Data Bases, VLDB Endowment, pp. 215–226 (2002)
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. Int. J. Very Large Data Bases 10(2–3), 199–223 (2001)
Chan, H.L., Lam, T.W., Lee, L.K., Ting, H.F.: Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica 62(3–4), 1088–1111 (2012)
Cohen, E., Strauss, M.: Maintaining time-decaying stream aggregates. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 223–233 (2003)
Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: Proceedings of the International Conference on Extending Database Technology, Springer, pp. 4–22 (2006)
Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: a language for extracting signatures from data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 9–17 (2000)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)
Ganguly, S., Garofalakis, M., Rastogi, R., Sabnani, K.: Streaming algorithms for robust, real-time detection of ddos attacks. In: Distributed Computing Systems, 2007. ICDCS’07. 27th International Conference on, IEEE, pp. 4–4 (2007)
Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM, pp. 476–487 (2002)
Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp. 166–176 (2004)
Gibbons, P.B., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, ACM, pp. 63–72 (2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. Vldb 1, 79–88 (2001)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng. 3, 541–554 (2003)
Gilbert, A.C., Kotidis, I., Muthukrishnan, S., Strauss, M.J.: Method and apparatus for using wavelets to produce data summaries. US Patent 7,296,014 (2007)
Guha, S.: Space efficiency in synopsis construction algorithms. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 409–420 (2005)
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, pp. 88–97 (2005)
Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54(2), 811–830 (2008)
Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. VLDB J. Int. J. Very Large Data Bases 17(5), 1079–1099 (2008)
Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 421–432 (2005)
Karras, P., Mamoulis, N.: The haar+ tree: a refined synopsis data structure. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, IEEE, pp. 436–445 (2007)
Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 380–389 (2007)
Li, T., Li, Q., Zhu, S., Ogihara, M.: A survey on wavelet applications in data mining. ACM SIGKDD Explor. Newslett. 4(2), 49–68 (2002)
Liu, K.H., Teng, W.G., Chen, M.S.: Dynamic wavelet synopses management over sliding windows in sensor networks. IEEE Trans. Knowl. Data Eng. 22(2), 193–206 (2010)
Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, pp. 555–566 (2002)
Matias, Y., Portman, L.: Workload-based wavelet synopses. Technical report, Department of Computer Science, Tel Aviv University, Tech. rep. (2003)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. ACM SIGMoD Rec. ACM 27, 448–459 (1998)
Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: Proceedings of the International Conference on Foundations of Software Technology and Theoretical Computer Science, Springer, pp. 285–296 (2005)
Mytilinis, I., Tsoumakos, D., Koziris, N.: Maintaining wavelet synopses for sliding-window aggregates. In: Proceedings of the 31st International Conference on Scientific and Statistical Database Management, ACM, pp 73–84 (2019)
NOAA (2019) National oceanic and atmospheric administration. https://www1.ncdc.noaa.gov/pub/data/noaa/
Papapetrou, O., Garofalakis, M., Deligiannakis, A.: Sketch-based querying of distributed sliding-window data streams. Proce. VLDB Endow. 5(10), 992–1003 (2012)
Qiao, L., Agrawal, D., El Abbadi, A.: Supporting sliding window queries for continuous data streams. In: Proceedings of the Scientific and Statistical Database Management, 2003. 15th International Conference on, IEEE, pp. 85–94 (2003)
Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing distributed data streams over sliding windows. Ph.D. Thesis, LINA-University of Nantes; Centre de Recherche en Économie et Statistique; Inria Rennes Bretagne Atlantique (2015)
Shah, Z., Mahmood, A.N., Tari, Z., Zomaya, A.Y.: A technique for efficient query estimation over distributed data streams. IEEE Trans. Parallel Distrib. Syst. 10, 2770–2783 (2017)
Stollnitz, E.J., DeRose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann, Burlington (1996)
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. Acm Sigmod Rec. ACM 28, 193–204 (1999)
Xu, B., Tirthapura, S., Busch, C.: Sketching asynchronous data streams over sliding windows. Distrib. Comput. 20(5), 359–374 (2008)
Yao, Y., Gehrke, J. et al: Query processing in sensor networks. In: Proceedings of the Cidr, pp. 233–244 (2003)
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time** work supported in part by us nsf grants iis-9988345 and n2010: 0115586. In: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, Elsevier, pp. 358–369 (2002)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mytilinis, I., Tsoumakos, D. & Koziris, N. Workload-aware wavelet synopses for sliding window aggregates. Distrib Parallel Databases 39, 445–482 (2021). https://doi.org/10.1007/s10619-020-07307-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07307-w