Abstract
Constructing Haar wavelet synopses with guaranteed maximum error on data approximations has many real world applications. In this paper, we take a novel approach towards constructing unrestricted Haar wavelet synopses under maximum error metrics (L ∞). We first provide two linear time (logN)-approximation algorithms which have space complexities of O(logN) and O(N) respectively. These two algorithms have the advantage of being both simple in structure and naturally adaptable for stream data processing. Unlike traditional approaches for synopses construction that rely heavily on examining wavelet coefficients and their summations, the proposed methods are very compact and scalable, and sympathetic for online data processing. We then demonstrate that this technique can be extended to other findings such as Haar+ tree. Extensive experiments indicate that these techniques are highly practical. The proposed algorithms achieve a very attractive tradeoff between efficiency and effectiveness, surpassing contemporary (logN)-approximation algorithms in compressing qualities.
Similar content being viewed by others
References
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)
Garofalakis, M., Gibbons, P.B.: Probabilistic wavelet synopses. ACM Trans. Database Syst. 29(1), 43–90 (2004). doi:10.1145/974750.974753
Guha, S.: Space efficiency in synopsis construction algorithms. In: VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 409–420. ACM, New York (2005)
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-Euclidean error. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 88–97. ACM, New York (2005). doi:10.1145/1081870.1081884
Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. Inf. Theory 54(2), 811–830 (2008). doi:10.1109/TIT.2007.913569
Guha, S., Shim, K., Woo, J.: Rehist: relative error histogram construction algorithms. In: VLDB ’04: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 300–311. Morgan Kaufmann, San Mateo (2004)
Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 421–432. ACM, New York (2005)
Karras, P., Mamoulis, N.: Hierarchical synopses with optimal error guarantees. ACM Trans. Database Syst. 33(3), 1–53 (2008). doi:10.1145/1386118.1386124
Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 380–389. ACM, New York (2007). doi:10.1145/1281192.1281235
Matias, Y., Urieli, D.: Optimal workload-based weighted wavelet synopses. In: Proceedings of International Conference on Database Theory (ICDT), pp. 368–382 (2005)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. SIGMOD Rec. 27(2), 448–459 (1998). doi:10.1145/276305.276344
Muthukrishnan, S.: Subquadratic algorithms for workload-aware Haar wavelet synopses. In: Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pp. 285–296 (2005)
Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Building data synopses within a known maximum error bound. In: APWeb/WAIM’07: Proceedings of the Joint 9th Asia-Pacific Web and 8th International Conference on Web-Age Information Management Conference on Advances in Data and Web Management, pp. 463–470. Springer, Berlin (2007)
Pang, C., Zhang, Q., Hansen, D., Maeder, A.: Unrestricted wavelet synopses under maximum error bound. In: EDBT ’09: Proceedings of the 12th International Conference on Extending Database Technology, pp. 732–743. ACM, New York (2009). doi:10.1145/1516360.1516445
Reiss, F., Garofalakis, M., Hellerstein, J.M.: Compact histograms for hierarchical identifiers. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB ’06, pp. 870–881. ACM, New York (2006). http://portal.acm.org/citation.cfm?id=1182635.1164202
Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann, San Francisco (1996)
UCI KDD archive. http://kdd.ics.uci.edu
Vitter, J.S., Wang, M., Iyer, B.: Data cube approximation and histograms via wavelets. In: CIKM ’98: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 96–104. ACM, New York (1998). doi:10.1145/288627.288645
Zhang, Q., Pang, C., Hansen, D.: On multidimensional wavelet synopses for maximum error bounds. In: DASFAA ’09: Proceedings of the 14th International Conference on Database Systems for Advanced Applications, pp. 646–661 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of the results in this paper appeared in Proceedings of the 12th International Conference on Extending Database Technology (EDBT) [14].
Rights and permissions
About this article
Cite this article
Pang, C., Zhang, Q., Zhou, X. et al. Computing Unrestricted Synopses Under Maximum Error Bound. Algorithmica 65, 1–42 (2013). https://doi.org/10.1007/s00453-011-9571-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-011-9571-9