skip to main content
10.1145/3206333.3206337acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Exploiting Data Partitioning To Provide Approximate Results

Published:15 June 2018Publication History

ABSTRACT

Co-hash partitioning is a popular partitioning strategy in distributed query processing, where tables are co-located using join predicates. In this paper, we study the benefits of co-hash partitioning for obtaining approximate answers.

References

  1. G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in Oracle. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1111--1122, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. SIGMOD Rec., 28(2):287--298, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. R. Haritsa. The Picasso database query optimizer visualizer. Proc. VLDB Endow., 3(1-2):1517--1520, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Qin and F. Rusu. PF-OLA: a high-performance framework for parallel online aggregation. Distributed and Parallel Databases, 32(3):337--375, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Ramnarayan, B. Mozafari, S. Wale, S. Menon, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. Snappydata: A hybrid transactional analytical store built on Spark. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 2153--2156, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. E. Särndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer, New York, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068--1079, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. TPC-H Benchmark. http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  9. K. Q. Tran, J. F. Naughton, B. Sundarmurthy, and D. Tsirogiannis. JECB: A join-extension, code-based approach to oltp data partitioning. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 39--50, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. M. Waas. Beyond Conventional Data Warehousing --- Massively Parallel Data Processing with Greenplum Database. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.Google ScholarGoogle Scholar
  11. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 17--30, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Data Partitioning To Provide Approximate Results

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BeyondMR'18: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond
          June 2018
          54 pages
          ISBN:9781450357036
          DOI:10.1145/3206333

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 June 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate19of36submissions,53%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader