Time-Interval Sampling for Improved Estimations in Data Warehouses

Furtado, Pedro; Costa, João Pedro

doi:10.1007/3-540-46145-0_32

Time-Interval Sampling for Improved Estimations in Data Warehouses

Pedro Furtado⁷ &
João Pedro Costa⁸

Conference paper
First Online: 01 January 2002

1232 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Abstract

In large data warehouses it is possible to return very fast approximate answers to user queries using pre-computed sampling summaries well-fit for all types of exploration analysis. However, their usage is constrained by the fact that there must be a representative number of samples in grouping intervals to yield acceptable accuracy. In this paper we propose and evaluate a technique that deals with the representation issue by using time interval-biased stratified samples (TISS). The technique is able to deliver fast accurate analysis to the user by taking advantage of the importance of the time dimension in most user analysis. It is designed as a transparent middle layer, which analyzes and rewrites the query to use a summary instead of the base data warehouse. The estimations and error bounds returned using the technique are compared to those of traditional sampling summaries, to show that it achieves significant improvement in accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acharaya, Gibbons, Poosala. “Congressional Samples for Approximate Answering of Group-By Queries”, ACM_SIGMOD Intl. Conference Management Data (Jun. 2000),487–498
Google Scholar
Acharaya, Gibbons, Poosala, Ramaswamy. “Join synopses for approximate query answering”, ACM_SIGMOD Intl. Conference Management Data (Jun. 1999), 275–286
Google Scholar
Barbara, DuMouchel, Faloutsos, Haas et al. Data Reduction Report. Bulletin of the TCDE(1997), 20(4): 3–45
Google Scholar
Gibbons, Matias. New sampling-based summary statistics for improving approximate query answers. ACM SIGMOD Int. Conference on Management of Data (Jun1998), 331–342
Google Scholar
P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. 9th Int. Conference on Scientific and Statistical Database Management (Aug. 1997)
Google Scholar
J.M. Hellerstein, P.J. Haas, and H.J. Wang. “Online aggregation”, ACM SIGMOD Int.Conference on Management of Data (May 1997), 171–182
Google Scholar
Yossi Matias, Jeffrey Scott Vitter, Wen-Chun Ni. Dynamic Generation of Discrete Random Variates. In Proc. 4th ACM-SIAM Symp. On Discrete Algorithms (Jan. 1993), 361–370,.
Google Scholar
Yossi Matias, Jeffrey Scott Vitter, Neal E. Young. Approximate Data Structures with Applications. In Proc. 5th ACM-SIAM Symp. On Discrete Algorithms (Jan. 1994), 187–194
Google Scholar
C.-E. Sarndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer-Verlag, New York (1992)
Google Scholar
TPC Benchmark H, Transaction Processing Council (June1999)
Google Scholar
J.S. Vitter, Random sampling with a reservoir. ACM Transactions on Mathematical Software(1985) 11(1):37–57
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dep. Engenharia Informática, Universidade de Coimbra, Polo II, Pinhal de Marrocos, 3030, Coimbra, Portugal
Pedro Furtado
Dep. Informática e de Sistemas, Instituto Superior de Engenharia de Coimbra, Quinta da Nora, Rua Pedro Nunes, 3030-119, Coimbra, Portugal
João Pedro Costa

Authors

Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar
João Pedro Costa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furtado, P., Costa, J.P. (2002). Time-Interval Sampling for Improved Estimations in Data Warehouses. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_32

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_32
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics