Abstract
In this paper, a sampling method based on the Metropolis algorithm is proposed. It is able to draw samples that have the same distribution as the underlying probability distribution. It is a simple, efficient, and powerful method suitable for all distributions. We have performed experiments to examine the qualities of the samples by comparing their statistical properties with the underlying population. The experimental results show that the samples selected by our method are bona fide representative.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C., Yu, P.: Finding generalized projected clusters in high dimensional spaces. In: Proc. ACM SIGMOD Conf., pp. 70–81 (2000)
Haas, P., Swami, A.: Sequential sampling procedures for query size estimation. In: Proc. of the ACM SIGMOD Conference, pp. 341–350 (1992)
Hou, W.-C., Ozsoyoglu, G.: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Transactions on Database Systems 16(4), 600–654 (1991)
Kalos, M.H., Whilock, P.A.: Monte Carlo Methods. Basic, vol. 1. John Wiley & Sons, Chichester (1986)
Lipton, R., Naughton, J.: Query size estimation by adaptive sampling. In: Proc. of the 9th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems, pp. 40–46 (1990)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of State Calculations by Fast Computing Machines. J. of Chem. Phys. 21(6), 1087–1092 (1953)
Newport, F., Saad, L., Moor, D.: Where America Stands. John Wiley, Chichester (1997)
Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. of the VLDB Conference, pp. 144–155 (1994)
Olken, F.: Random Sampling from Databases, Ph.D dissertation, U. of California (April 1993)
Palmer, C.R., Faloutsos, C.: Density biased sampling: An improved method for data mining and clustering. In: Proc. of the ACM SIGMOD Conference, vol. 29(2), pp. 82–92 (2000)
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C. Cambridge University Press, Cambridge (1994)
Rubinstein, R.: Simulation and the Monte Carlo Method. John Wiley & Sons, Chichester (1981)
Spiegel, M.R.: Probability and Statistics. McGraw-Hill, Inc., New York (1991)
Website of Federal Election Commission, http://www.fec.gov/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, H., Hou, WC., Yan, F., Zhu, Q. (2005). A Metropolis Sampling Method for Drawing Representative Samples from Large Databases. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_21
Download citation
DOI: https://doi.org/10.1007/11408079_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25334-1
Online ISBN: 978-3-540-32005-0
eBook Packages: Computer ScienceComputer Science (R0)