Skip to main content

Workload-Aware Histograms for Remote Applications

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Abstract

Recently several database-based applications have emerged that are remote from data sources and need accurate histograms for query cardinality estimation. Traditional approaches for constructing histograms require complete access to data and are I/O and network intensive, and therefore no longer apply to these applications. Recent approaches use queries and their feedback to construct and maintain “workload aware” histograms. However, these approaches either employ heuristics, thereby providing no guarantees on the overall histogram accuracy, or rely on detailed query feedbacks, thus making them too expensive to use. In this paper, we propose a novel, incremental method for constructing histograms that uses minimum feedback and guarantees minimum overall residual error. Experiments on real, high dimensional data shows 30-40% higher estimation accuracy over currently known heuristic approaches, which translates to significant performance improvement of remote applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Olston, C., Widom, J.: Best-effort cache synchronization with source cooperation. In: ACM SIGMOD, pp. 73–84 (2002)

    Google Scholar 

  2. Malik, T., Burns, R.C., Chaudhary, A.: Bypass caching: Making scientific databases good network citizens. In: Intl’ Conference on Data Engineering, pp. 94–105 (2005)

    Google Scholar 

  3. Ambite, J.L., Knoblock, C.A.: Flexible and scalable query planning in distributed and heterogeneous environments. In: Conference on Artificial Intelligence Planning Systems, pp. 3–10 (1998)

    Google Scholar 

  4. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, 486–495 (1997)

    Google Scholar 

  5. Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Transactions on Database Systems 27, 261–298 (2002)

    Article  Google Scholar 

  6. Malik, T., Burns, R., Chawla, N., Szalay, A.: Estimating query result sizes for proxy caching in scientific database federations. In: SuperComputing (2006)

    Google Scholar 

  7. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

    Google Scholar 

  8. Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: 22nd International Conference on Data Engineering, p. 39. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  9. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 181–192 (1999)

    Google Scholar 

  10. Malik, T., Burns, R., Chawla, N.: A black-box approach to query cardinality estimation. In: Conference on Innovative Database System Research (2007)

    Google Scholar 

  11. Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 161–172 (1994)

    Google Scholar 

  12. Fujiwara, H., Iwama, K.: Average-case competitive analyses for ski-rental problems. In: Intl. Symposium on Algorithms and Computation (2002)

    Google Scholar 

  13. Young, P.: Recursive Estimation and Time Series Analysis. Springer, New York (1984)

    MATH  Google Scholar 

  14. Ari, M.: On transposing large2n ×2nmatrices. IEEE Trans. Computers 28, 72–75 (1979)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Malik, T., Burns, R. (2008). Workload-Aware Histograms for Remote Applications. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics