Abstract
In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from databases, and for inclusion of sampling as a DBMS operator. We consider basic sampling algorithms, sampling from trees, sampling from hash tables, and auxiliary memory resident index information to facilitate sampling.
This work was supported by the Director, Office of Energy Research, Office of Basic Energy Sciences, Applied Mathematical Sciences Division of the U.S. Department of Energy under Contract DE-AC03-76SF00098.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Herbert Arkin. Handbook of Sampling for Auditing and Accounting. McGraw-Hill, 1984.
B.T. Bennett and V.J. Kruskal. Lru stack processing. IBM Journal of Research and Development, 19(4):353–357, July 1975.
William G. Cochran. Sampling Techniques. Wiley, 1977.
Dorothy E. Denning. Secure statistical databases with random sample queries. ACM Transactions on Database Systems, 5(3):291–35, Sept. 1980.
Jarmo Ernvall and Olli Nevalainen. An algorithm for unbiased random sampling. The Computer Journal, 25(1), 1982.
C.T. Fan, M.E. Muller, and I. Rezucha. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Journal of the American Statistical Association, 57:387–402, June 1962.
S. Ghosh. Siam: Statistics information access method. In Proceedings of the Third International Workshop on Statistical and Scientific Database Management, pages 286–293. EUROSTAT, Luxembourg, 1986.
Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeo K. Taneja. Statistical estimators for relational algebra expressions. In Proceedings of the Seventh ACM Conference on Principles of Database Systems, pages 288–293, March 1988.
Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeo K. Taneja. Processing aggregate relational queries with hard time constraints. In ACM SIGMOD International Conference on the Management of Data, pages 68–77, June 1989.
Donald Ervin Knuth. The Art of Computer Programming: Vol. 3, Sorting and Searching. Addison-Wesley, 1973.
P.-A. Larson. Linear hashing with partial expansions. In Proceedings of the Sixth International Conference on Very Large Databases (VLDB), pages 224–232, 1980.
W. Litwin. Linear hashing: a new tool for file and table addressing. In Proceedings of the Sixth International Conference on Very Large Databases (VLDB), pages 212–223, 1980.
Donald A. Leslie, Albert D. Teitlebaum, and Rodney J. Anderson. Dollar Unit Sampling. Copp Clark Pitmanan, 1979.
H.-J. Lenz, G.B. Wetherill, and P.-Th. Wilrich, editors. Frontiers in Statistical Quality Control 2. Physica-Verlag, Wurzburg, Germany, 1984.
Douglas C. Montogmery. Introduction to Statistical Quality Control. Wiley, 1985.
Jacob Morgenstein. Computer Based Management Information Systems Embodying Answer Accuracy as a User Parameter. PhD thesis, Univ. of California, Berkeley, December 1980.
J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: An adaptable, symmetric multkey structure. ACM Transactions on Database Systems, 9(1):38–71, March 1984.
Frank Olken and Doron Rotem. Random sampling from b + trees.
Frank Olken and Doron Rotem. Simple random sampling from relational databases. In Proceedings of the Twelfth International Conference on Very Large Databases (VLDB), pages 160–169, August 1986.
P. Palvia. Expressions for batched searching of sequential and hierarchical files. ACM Transactions on Database Systems, 10(1):97–106, March 1985.
J. Srivastava and V.L. Lum. A tree based access method (tbsam) for fast processing of aggregate queries. In Proceedings of the 4th International Conference on Data Engineering, pages 504–510. IEEE Computer Scoeity, 1988.
Jeffrey Scott Vitter. Faster methods of random sampling. Communications of the ACM, 27(7):703–718, July 1984.
Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37–57, March 1985.
C.K. Wong and M.C. Easton. An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9(1):111–113, February 1980.
Dan Willard. Sampling algorithms for differential batch retrieval problems (extended abstract). In Proceedings ICALP-84. Springer-Verlag, 1984.
S. Bing Yao. Approximating the number of accesses in database organizations. Communications of the ACM, 20(4):260–261, April 1977.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Olken, F., Rotem, D. (1990). Random sampling from database files: A survey. In: Michalewicz, Z. (eds) Statistical and Scientific Database Management. SSDBM 1990. Lecture Notes in Computer Science, vol 420. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52342-1_23
Download citation
DOI: https://doi.org/10.1007/3-540-52342-1_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-52342-0
Online ISBN: 978-3-540-46968-1
eBook Packages: Springer Book Archive