Skip to main content

Time-Interval Sampling for Improved Estimations in Data Warehouses

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Abstract

In large data warehouses it is possible to return very fast approximate answers to user queries using pre-computed sampling summaries well-fit for all types of exploration analysis. However, their usage is constrained by the fact that there must be a representative number of samples in grouping intervals to yield acceptable accuracy. In this paper we propose and evaluate a technique that deals with the representation issue by using time interval-biased stratified samples (TISS). The technique is able to deliver fast accurate analysis to the user by taking advantage of the importance of the time dimension in most user analysis. It is designed as a transparent middle layer, which analyzes and rewrites the query to use a summary instead of the base data warehouse. The estimations and error bounds returned using the technique are compared to those of traditional sampling summaries, to show that it achieves significant improvement in accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acharaya, Gibbons, Poosala. “Congressional Samples for Approximate Answering of Group-By Queries”, ACM_SIGMOD Intl. Conference Management Data (Jun. 2000),487–498

    Google Scholar 

  2. Acharaya, Gibbons, Poosala, Ramaswamy. “Join synopses for approximate query answering”, ACM_SIGMOD Intl. Conference Management Data (Jun. 1999), 275–286

    Google Scholar 

  3. Barbara, DuMouchel, Faloutsos, Haas et al. Data Reduction Report. Bulletin of the TCDE(1997), 20(4): 3–45

    Google Scholar 

  4. Gibbons, Matias. New sampling-based summary statistics for improving approximate query answers. ACM SIGMOD Int. Conference on Management of Data (Jun1998), 331–342

    Google Scholar 

  5. P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. 9th Int. Conference on Scientific and Statistical Database Management (Aug. 1997)

    Google Scholar 

  6. J.M. Hellerstein, P.J. Haas, and H.J. Wang. “Online aggregation”, ACM SIGMOD Int.Conference on Management of Data (May 1997), 171–182

    Google Scholar 

  7. Yossi Matias, Jeffrey Scott Vitter, Wen-Chun Ni. Dynamic Generation of Discrete Random Variates. In Proc. 4th ACM-SIAM Symp. On Discrete Algorithms (Jan. 1993), 361–370,.

    Google Scholar 

  8. Yossi Matias, Jeffrey Scott Vitter, Neal E. Young. Approximate Data Structures with Applications. In Proc. 5th ACM-SIAM Symp. On Discrete Algorithms (Jan. 1994), 187–194

    Google Scholar 

  9. C.-E. Sarndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer-Verlag, New York (1992)

    Google Scholar 

  10. TPC Benchmark H, Transaction Processing Council (June1999)

    Google Scholar 

  11. J.S. Vitter, Random sampling with a reservoir. ACM Transactions on Mathematical Software(1985) 11(1):37–57

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Furtado, P., Costa, J.P. (2002). Time-Interval Sampling for Improved Estimations in Data Warehouses. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-46145-0_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44123-6

  • Online ISBN: 978-3-540-46145-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics