Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Bouganim, Luc; Florescu, Daniela; Valduriez, Patrick

doi:10.1023/A:1008642513285

Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Published: January 1999

Volume 7, pages 99–121, (1999)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Luc Bouganim¹,
Daniela Florescu¹ &
Patrick Valduriez¹

75 Accesses
10 Citations
Explore all metrics

Abstract

To scale up to high-end configurations, shared-memory multiprocessors are evolving towards Non Uniform Memory Access (NUMA) architectures. In this paper, we address the central problem of load balancing during parallel query execution in NUMA multiprocessors. We first show that an execution model for NUMA should not use data partitioning (as shared-nothing systems do) but should strive to exploit efficient shared-memory strategies like Synchronous Pipelining (SP). However, SP has problems in NUMA, especially with skewed data. Thus, we propose a new execution strategy which solves these problems. The basic idea is to allow partial materialization of intermediate results and to make them progressivly public, i.e., able to be processed by any processor, as needed to avoid processor idle times. Hence, we call this strategy Progressive Sharing (PS). We conducted a performance comparison using an implementation of SP and PS on a 72-processor KSR1 computer, with many queries and large relations. With no skew, SP and PS have both linear speed-up. However, the impact of skew is very severe on SP performance while it is insignificant on PS. Finally, we show that, in NUMA, PS can also be beneficial in executing several pipeline chains concurrently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balancing Shared and Distributed Heaps on NUMA Architectures

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

Article Open access 24 January 2022

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

References

A. Agarwal, R. Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, “The MIT alewife machine: Architecture and performance,” Int. Symp. on Computer Architecture, June 1995.
P.M.G. Apers, C.A. van den Berg, J. Flokstra, P.W.P.J. Grefen, M.L. Kersten, and A.N. Wilschut, “PRISMA/DB: A parallel main memory relational DBMS,” IEEE Trans. Knowledge and Data Engineering, vol. 4,no. 6, December 1992.
A. Bhide, “An analysis of three transaction processing architectures,” Int. Conf on VLDB, Los Angeles, August 1988.
M. Blasgen, J. Gray, M. Mitoma, and T. Price, “The convoy phenomenon,” Operating Systems Review, vol. 13,no. 2, April 1979.
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez, “Prototyping Bubba: A highly parallel database system,” IEEE Trans. Knowledge and Data Engineering, vol. 2,no. 1, March 1990.
L. Bouganim, B. Dageville, and P. Valduriez, “Adaptative parallel query execution in DBS3,” Industrial Paper, Int. Conf. on EDBT Avignon, March 1996.
L. Bouganim, D. Florescu, and P. Valduriez, “Dynamic load balancing in hierarchical parallel database systems,” Int. Conf. on VLDB, Bombay, September 1996. Can be retrieved at http://rodin.inria.fr/personnes/luc.bouganim/papers/VLDB.html
Data General Corporation, “Data general and oracle to optimize oracle universal server for ccNUMA system,” can be retrieved at http://www.dg.com/news/press releases/11 4 96.html
Data General Corporation, “The NUMA invasion,” can be retrieved at http://www.dg.com/newdocs1/ccnuma/iw1 6 97.html
Data General Corporation, “Standard high volume servers: The new building block,” can be retrieved at http://www.dg.com/newdocs1/ccnuma/index.html#a
D.J. DeWitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, “The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2,no. 1, March 1990.
D.J. DeWitt and J. Gray, “Parallel database systems: The future of high performance database processing,” Communications of the ACM, vol. 35,no. 6, June 1992.
D.J. DeWitt, J.F. Naughton, D.A. Schneider, and S. Seshadri, “Practical skew handling in parallel joins,” Int. Conf. on VLDB, Vancouver, August 1992.
S. Frank, H. Burkhardt, and J. Rothnie, “The KSR1: Bridging the gap between shared-memory and MPPs,” Compcon'93, San Francisco, February 1993.
M.N. Garofalakis and Y.E. Yoannidis, “Multi-dimensional resource scheduling for parallel queries,” ACM-SIGMOD Int. Conf., Montreal, June 1996.
J.R. Goodman and P.J. Woest, “The Wisconsin multicube: A new large-scale cache-coherent multiprocessor,” University of Wisconsin-Madison, TR 766, April 1988.
G. Graefe, “Volcano: An extensible and parallel dataflow query evaluation system,” IEEE Trans. on Knowledge and Data Engineering, vol. 6,no. 1, February 1994.
E. Hagersten, E. Landin, and S. Haridi, “Ddm—A cache-only memory architecture,” IEEE Computer, vol. 25,no. 9, September 1992.
W. Hasan and R. Motwani, “Optimization algorithms for exploiting the parallel communication tradeoff in pipelined parallelism,” Int. Conf on VLDB, Santiago, 1994.
Y. Hirano, T. Satoh, A.U. Inoue, and K. Teranaka, “Load balancing algorithms for parallel database processing on shared memory multiprocessors,” Int. Conf. on Parallel and Distributed Information Systems, Miami Beach, December 1991.
W. Hong, “Exploiting inter-operation parallelism in XPRS,” ACM-SIGMOD Int. Conf., San Diego, June 1992.
H. Hsiao, M.S. Chen, and P.S. Yu, “On parallel execution of multiple pipelined hash joins,” ACM-SIGMOD Int. Conf., Minneapolis, May 1994.
IEEE Computer Society, “IEEE standard for scalable coherent interface (SCI),” IEEE Std 1596, New York, August 1992.
Intel Corporation, “Standard high volume servers: Changing the rules for buiseness computing,” can be retrieved at http://www.intel.com/procs/servers/feature/shv/
M. Kitsuregawa and Y. Ogawa, “Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer,” Int. Conf on VLDB, Brisbane, 1990.
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, “The Stanford FLASH multiprocessor,” Int. Symp. on Computer Architecture, April 1994.
R. Lanzelotte, P. Valduriez, and M. Zait, “On the effectiveness of optimization search strategies for parallel execution spaces,” Int. Conf. on VLDB, Dublin, August 1993.
D. Lenoski, J. Laudon, K. Gharachorloo, W.D. Weber, A. Gupta, J. Henessy, M. Horowitz, and M.S. Lam, “The Stanford dash multiprocessor,” IEEE Computer, vol. 25,no. 3, March 1992.
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, “The DASH prototype: Logic overhead and performance,” IEEE Transactions of Parallel and Distributed Systems, vol. 4,no. 1, January 1993.
M.L. Lo, M-S. Chen, C.V. Ravishankar, and P.S. Yu, “On optimal processor allocation to support pipelined hash joins,” ACM-SIGMOD Int. Conf., Washington, May 1993.
T. Lovett and R. Clapp, “STiNG: A CC-NUMA computer system for the commercial marketplace,” Int. Symp. on Computer Architecture, May 1996.
H. Lu, M.-C. Shan, and K.-L. Tan, “Optimization of multi-way join queries for parallel execution,” Int. Conf. on VLDB, Barcelona, September 1991.
M. Metha and D. DeWitt, “Managing intra-operator parallelism in parallel database systems,” Int. Conf. on VLDB, Zurich, September 1995.
C. Morin, A. Gefflaut, M. Banâtre, and A.M. Kermarrec, “COMA: An opportunity for building fault-tolerant scalable shared memory multiprocessors,” Int. Symp. on Computer Architectures, 1996.
M.C. Murphy and M.-C. Shan, “Execution plan balancing,” IEEE Int. Conf. on Data Engineering, Kobe, April 1991.
E. Omiecinski, “Performance analysis of a load balancing hash-join algorithm for a shared-memory multiprocessor,” Int. Conf on VLDB, Barcelona, September 1991.
H. Pirahesh, C. Mohan, J. Cheng, T.S. Liu, and P. Selinger, “Parallelism in relational database systems: Architectural issues and design approaches,” Int. Symp. on Databases in Parallel and Distributed Systems, Dublin, July 1990.
E. Rahm and R. Marek, “Dynamic multi-resource load balancing in parallel database systems,” Int. Conf. on VLDB, Zurich, Switzerland, September 1993.
D. Schneider and D. DeWitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,” ACM-SIGMOD Int. Conf., Portland, May–June 1989.
A. Shatdal and J.F. Naughton, “Using shared virtual memory for parallel join processing,” ACM-SIGMOD Int. Conf., Washington, May 1993.
E.J. Shekita and H.C. Young, “Multi-join optimization for symmetric multiprocessor,” Int. Conf. on VLDB, Dublin, August 1993.
A.J. Smith, “Cache memories,” ACM Computing Surveys, vol. 14,no. 3, September 1982.
J. Srivastava and G. Elsesser, “Optimizing multi-join queries in parallel relational databases,” Int. Conf. on Parallel and Distributed Information Systems, San Diego, January 1993.
P. Stenstrom, T. Joe, and A. Gupta, “Comparative performance evaluation of cache-coherent NUMA and COMA architectures,” Int. Symp. on Computer Architecture, May 1992.
P. Valduriez, “Parallel database systems: Open problems and new issues,” Int. Journal on Distributed and Parallel Databases, vol. 1,no. 2, 1993.
P. Valduriez and G. Gardarin, “Join and semi-join algorithms for a multiprocessor database machine,” ACM Trans. on Database Systems, vol. 9,no. 1, March 1984.
C.A. van den Berg and M.L. Kersten, “Analysis of a dynamic query optimization technique for multi-join queries,” Int. Conf. on Information and Knowledge Engineering, Washington, 1992.
C.B. Walton, A.G. Dale, and R.M. Jenevin, “A taxonomy and performance model of data skew effects in parallel joins,” Int. Conf. on VLDB, Barcelona, September 1991.
A.N. Wilshut, J. Flokstra, and P.G. Apers, “Parallel evaluation of multi-join queries,” ACM-SIGMOD Int. Conf., San Jose, 1995.

Download references

Author information

Authors and Affiliations

INRIA Rocquencourt, France. E-mail
Luc Bouganim, Daniela Florescu & Patrick Valduriez

Authors

Luc Bouganim
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Florescu
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouganim, L., Florescu, D. & Valduriez, P. Load Balancing for Parallel Query Execution on NUMA Multiprocessors. Distributed and Parallel Databases 7, 99–121 (1999). https://doi.org/10.1023/A:1008642513285

Download citation

Issue Date: January 1999
DOI: https://doi.org/10.1023/A:1008642513285

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Abstract

Access this article

Similar content being viewed by others

Balancing Shared and Distributed Heaps on NUMA Architectures

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Abstract

Access this article

Similar content being viewed by others

Balancing Shared and Distributed Heaps on NUMA Architectures

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation