skip to main content
10.1145/2926534.2926536acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Deterministic load balancing for parallel joins

Published:26 June 2016Publication History

ABSTRACT

We study the problem of distributing the tuples of a relation to a number of processors organized in an r-dimensional hypercube, which is an important task for parallel join processing. In contrast to previous work, which proposed randomized algorithms for the task, we ask here the question of how to construct efficient deterministic distribution strategies that can optimally load balance the input relation. We first present some general lower bounds on the load for any dimension; these bounds depend not only on the size of the relation, but also on the maximum frequency of each value in the relation. We then construct an algorithm for the case of 1 dimension that is optimal within a constant factor, and an algorithm for the case of 2 dimensions that is optimal within a polylogarithmic factor. Our 2-dimensional algorithm is based on an interesting connection with the vector load balancing problem, a well-studied problem that generalizes classic load balancing.

References

  1. F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99--110, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Azar, I. R. Cohen, S. Kamara, and B. Shepherd. Tight bounds for online vector bin packing. In STOC, pages 961--970, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In PODS, pages 273--284, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Beame, P. Koutris, and D. Suciu. Skew in parallel query processing. In PODS, pages 212--223, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Chekuri and S. Khanna. On multidimensional packing problems. SIAM J. Comput., 33(4):837--851, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Halperin, V. T. de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, S. Xu, M. Balazinska, B. Howe, and D. Suciu. Demonstration of the Myria big data management service. In SIGMOD, pages 881--884, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Im, N. Kell, J. Kulkarni, and D. Panigrahi. Tight bounds for online vector scheduling. In FOCS, pages 525--544, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Meyerson, A. Roytman, and B. Tagiku. Online multidimensional load balancing. In APPROX-RANDOM, pages 287--302, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In SIGMOD, pages 13--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deterministic load balancing for parallel joins

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BeyondMR '16: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond
        June 2016
        70 pages
        ISBN:9781450343114
        DOI:10.1145/2926534

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 June 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        BeyondMR '16 Paper Acceptance Rate10of19submissions,53%Overall Acceptance Rate19of36submissions,53%
      • Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader