skip to main content
10.1145/304182.304206acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

On random sampling over joins

Published:01 June 1999Publication History

ABSTRACT

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.

References

  1. 1.S. Chaudhuri, R. Motwani, and V. Narasayya. Using Random Sampling for Histogram Construction. In Proc. A CM SIGMOD Conference, pages 436-447, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Ganguly, P.B. Gibbons, Y. Matias, and A. Silberschatz. Bifocal Sampling for Skew-Resistant Join Size Estimation. In Proc. A CM SIGMOD Conference, pages 271-281, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.P.J. Haas, J.F. Naughton, and A.N. Swami. On the Relative Cost of Sampling for Join Selectivity Estimation. In Proc. 13th ACM PODS, pages 14-24, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.J.M. Hellerstein, P.J. Haas, and H.J. Wang. Online Aggregation. In Proc. A CM SIGMOD Conference, pages 171-182, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.W. Hou, G. Ozsoyoglu, and E. Dogdu. Error- Constrained COUNT Query Evaluation in Relational Databases. In Proc. A CM SIGMOD Conference, pages 278-287, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri. Efficient Sampling Strategies for Relational Database Operations. Theoretical Computer Science 116(1993): 195-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.J.F. Naughton and S. Seshadri. On Estimating the Size of Projections. In Proc. Third International Conference on Database Theory, pages 499-513, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.F. Olken and D. Rotem. Simple random sampling from relational databases. In Proc. 12th VLDB, pages 160- 169, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.F. Olken. Random Sampling from Databases. PhD Dissertation, Computer Science, University of California at Berkeley, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.G. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. In Proc. A CM SIGMOD Conference, pages 256-276, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.J.S. Vitter. Random sampling with a reservoir. A CM Trans. Mathematical Software, 11 (1985): 37-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.G.E. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, Inc, 1949.Google ScholarGoogle Scholar

Index Terms

  1. On random sampling over joins

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
                June 1999
                604 pages
                ISBN:1581130848
                DOI:10.1145/304182

                Copyright © 1999 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 June 1999

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader