Abstract
Recently several database-based applications have emerged that are remote from data sources and need accurate histograms for query cardinality estimation. Traditional approaches for constructing histograms require complete access to data and are I/O and network intensive, and therefore no longer apply to these applications. Recent approaches use queries and their feedback to construct and maintain “workload aware” histograms. However, these approaches either employ heuristics, thereby providing no guarantees on the overall histogram accuracy, or rely on detailed query feedbacks, thus making them too expensive to use. In this paper, we propose a novel, incremental method for constructing histograms that uses minimum feedback and guarantees minimum overall residual error. Experiments on real, high dimensional data shows 30-40% higher estimation accuracy over currently known heuristic approaches, which translates to significant performance improvement of remote applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: ACM SIGMOD, pp. 73–84 (2002)
Malik, T., Burns, R.C., Chaudhary, A.: Bypass caching: Making scientific databases good network citizens. In: Intl’ Conference on Data Engineering, pp. 94–105 (2005)
Ambite, J.L., Knoblock, C.A.: Flexible and scalable query planning in distributed and heterogeneous environments. In: Conference on Artificial Intelligence Planning Systems, pp. 3–10 (1998)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, 486–495 (1997)
Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Transactions on Database Systems 27, 261–298 (2002)
Malik, T., Burns, R., Chawla, N., Szalay, A.: Estimating query result sizes for proxy caching in scientific database federations. In: SuperComputing (2006)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)
Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: 22nd International Conference on Data Engineering, p. 39. IEEE Computer Society, Los Alamitos (2006)
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 181–192 (1999)
Malik, T., Burns, R., Chawla, N.: A black-box approach to query cardinality estimation. In: Conference on Innovative Database System Research (2007)
Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 161–172 (1994)
Fujiwara, H., Iwama, K.: Average-case competitive analyses for ski-rental problems. In: Intl. Symposium on Algorithms and Computation (2002)
Young, P.: Recursive Estimation and Time Series Analysis. Springer, New York (1984)
Ari, M.: On transposing large2n ×2nmatrices. IEEE Trans. Computers 28, 72–75 (1979)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malik, T., Burns, R. (2008). Workload-Aware Histograms for Remote Applications. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)