Estimating Sufficient Sample Sizes for Approximate Decision Support Queries

Rudra, Amit; Gopalan, Raj P.; Achuthan, N. R.

doi:10.1007/978-3-319-09492-2_6

Amit Rudra¹⁰,
Raj P. Gopalan¹¹ &
N. R. Achuthan¹²

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 190))

Included in the following conference series:

International Conference on Enterprise Information Systems

1043 Accesses

Abstract

Sampling schemes for approximate processing of highly selective decision support queries need to retrieve sufficient number of records that can provide reliable results within acceptable error limits. The k-MDI tree is an innovative index structure that supports drawing rich samples of relevant records for a given set of dimensional attribute ranges. This paper describes a method for estimating sufficient sample sizes for decision support queries based on inverse simple random sampling without replacement (SRSWOR). Combined with a k-MDI tree index, this method is shown to offer a reliable approach to approximate query processing for decision support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimation of View Size Using Sampling Techniques

Efficient Sampling Methods for Discrete Distributions

Article Open access 29 August 2016

Efficient Random Sampling from Very Large Databases

References

Aouiche, K., Lemire, D.: A comparison of five probabilistic view-size estimation techniques in OLAP. In: DOLAP’07, Lisboa, Portugal (2007)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975)
Article MATH Google Scholar
Berenson, M.L., Levine, D.M.: Basic Business Statistics - Concepts and Applications. Prentice Hall, Upper Saddle River (1992)
Google Scholar
Chaudhuri, A., Mukerjee, R.: Domain estimation in finite populations. Aust. J. Stat. 27, 135–137 (1985)
Article MATH MathSciNet Google Scholar
Chaudhuri, S.: What next? a half-dozen data management research goals for big data and the cloud. In: PODS 2012, Scottsdale, Arizona, USA, 21–23 May 2012
Google Scholar
Fisher, D.: Incremental, approximate database queries and uncertainty for exploratory visualization. In: IEEE Symposium on Large Data Analysis and Visualization, Providence, RI, USA, 23–24 October 2011, pp. 73–80 (2011)
Google Scholar
Fisher, D., Popov, I., Drucker, S.M., Schraefel, M.: Trust me, i’m partially right: incremental visualization lets analysts explore large datasets faster. In: CHI 2012, Austin, Texas, USA, 5–10 May 2012, pp. 1673–1682 (2012)
Google Scholar
Heule, S., Numkesser, M., Hall, A.: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: EDBT/ICDT’13 2013, Genoa, Italy, 18–22 March 2013
Google Scholar
Hobbs, L., Hillson, S., Lawande, S.: Oracle9iR2 Data Warehousing. Elsevier Science, Boston (2003)
Google Scholar
Jermaine, C.: Random shuffling of large database tables. IEEE Trans. Knowl. Data Eng. 18(1), 73–84 (2007)
Article Google Scholar
Jermaine, C.: Robust estimation with sampling and approximate pre-aggregation. In: VLDB Conference Proceedings 2003, pp. 886–897 (2003)
Google Scholar
Jermaine, C., Pol, A., Arumugam, S.: Online maintenance of very large random samples. In: SIGMOD Conference Proceedings 2004 (2004)
Google Scholar
Jin, R., Glimcher, L., Jermaine, C., Agrawal, G.: New sampling-based estimators for OLAP queries. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), Atlanta, GA, USA (2006)
Google Scholar
Joshi, S., Jermaine, C.: Materialized sample views for database approximation. IEEE Trans. Knowl. Data Eng. 20(3), 337–351 (2008)
Article Google Scholar
Li, X., Han, J., Yin, Z., Lee, J.-G., Sun, Y.: Sampling cube: a framework for statistical OLAP over sampling data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’08), Vancouver, BC, Canada, June (2008)
Google Scholar
Olken, F., Rotem, D.: Random sampling from database file: a survey. In: Michalewicz, Z. (ed.) SSDBM 1990. LNCS, vol. 420, pp. 92–111. Springer, Heidelberg (1990)
Chapter Google Scholar
Rudra, A., Gopalan, R.P., Achuthan, N.R.: Efficient sampling techniques in approximate decision support query processing. In: Proceedings of the International Conference on Enterprise Information Systems - ICEIS 2012, Wroclaw, Poland, June 28–July 2 2012
Google Scholar
Sangngam, P., Suwatee, P.: Modified sampling scheme in inverse sampling without replacement. In: 2010 International Conference on Networking and Information Technology, pp. 580–584 . IEEE Press, New York (2010)
Google Scholar
Spiegel, J., Polyzotis, N.: TuG synopses for approximate query answering. ACM Trans. Database Syst. (TODS) 34(1), 1–56 (2009)
Article Google Scholar
TUN: Teradata University Network. http://www.teradata.com/TUN_databases (2007). Accessed 12 Jun 2007)
TPC-H: Transaction Processing Council. Decision Support Queries. http://www.teradata.com/TUN_databases (2007). Accessed 23 Apr 2007

Download references

Author information

Authors and Affiliations

School of Information Systems, Curtin University, Perth, Australia
Amit Rudra
Department of Computing, Curtin University, Perth, Australia
Raj P. Gopalan
Independent Consultant, Perth, Australia
N. R. Achuthan

Authors

Amit Rudra
View author publications
You can also search for this author in PubMed Google Scholar
Raj P. Gopalan
View author publications
You can also search for this author in PubMed Google Scholar
N. R. Achuthan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raj P. Gopalan .

Editor information

Editors and Affiliations

Groupe ESEO, Angers, France
Slimane Hammoudi
Polytechnic Institute of Setúbal, Setúbal, Portugal & INSTICC, Setúbal, Portugal
José Cordeiro
Wroclaw University of Economics, Wroclaw, Poland & Macquarie University, Sydney, NSW, Australia
Leszek A. Maciaszek
Polytechnic Institute of Setúbal, Setúbal, Portugal & INSTICC, Setúbal, Poland
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rudra, A., Gopalan, R.P., Achuthan, N.R. (2014). Estimating Sufficient Sample Sizes for Approximate Decision Support Queries. In: Hammoudi, S., Cordeiro, J., Maciaszek, L., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2013. Lecture Notes in Business Information Processing, vol 190. Springer, Cham. https://doi.org/10.1007/978-3-319-09492-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-09492-2_6
Published: 25 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09491-5
Online ISBN: 978-3-319-09492-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics