Abstract
Parallel database systems aim at providing high throughput for OLTP transactions as well as short response times for complex and data-intensive queries. Shared nothing systems represent the major architecture for parallel database processing. While the performance of such systems has been extensively analyzed in the past, the corresponding studies have made a number of best-case assumptions. In particular, almost all performance studies on parallel query processing assumed single-user mode, i.e., that the entire system is exclusively reserved for processing a single query. We study the performance of parallel join processing under more realistic conditions, in particular for multi-user mode. Experiments conducted with a detailed simulation model of shared nothing systems demonstrate the need for dynamic load balancing strategies for efficient join processing in multi-user mode. We focus on two major issues: (a) determining the number of processors to be allocated for the execution of join queries, and (b) determining which processors are to be chosen for join processing. For these scheduling decisions, we consider the current resource utilization as well as the size of intermediate results. Even simple dynamic scheduling strategies are shown to outperform static schemes by a large margin.
Preview
Unable to display preview. Download preview PDF.
References
Chen, M.; Yu, P.; Wu, K. 1992: Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proc. 8th IEEE Data Engineering Conference, 58–67.
Boral, H. et al. 1990: Prototyping Bubba: A Highly Parallel Database System. IEEE Trans. on Knowledge and Data Engineering 2(1), 4–24.
DeWitt, D.J. et al. 1990: The Gamma Database Machine Project. IEEE Trans. on Knowledge and Data Engineering 2(1), 4–62.
DeWitt, D.; Gray, J. 1992: Parallel Database Systems: The Future of High Performance Database Processing. Communications of the ACM 35(6), 85–98.
Englert, S., Gray, I, Kocher, T., Shath, P. 1990: A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scale-Up on Large Databases. Proc. ACM SIGMETRICS Conf., 245–246.
Graefe, G; Ward, K. 1989: Dynamic Query Evaluation Plans. Proc. 1989 SIGMOD Conf., 358–366.
Graefe, G. 1990: Volcano, an Extensible and Parallel Query Evaluation System. University of Colorado at Boulder, Department of Computer Science.
Gray, J. (Editor) 1991: The Benchmark Handbook. Morgan Kaufmann Publishers Inc.
Livny, M. 1989: DeNet Users's Guide, Version 1.5. Computer Science Department, University of Wisconsin, Madison.
Marek, R.; Rahm, E. 1992: Performance Evaluation of Parallel Transaction Processing in Shared Nothing Database Systems. Proc. 4th Int. PARLE Conference, LNCS 605, Springer, 295–310.
Mohan, C., Lindsay, B., Obermarck, R. 1986: Transaction Management in the R* Distributed Database Management System. ACM TODS 11 (4), 378–396.
Murphy, M.; Shan, M. 1991: Execution Plan Balancing. Proc. 1st Int. Conf. on Parallel and Distributed Information Systems.
Neches, P.M.1986: The Anatomy of a Database Computer — Revisited. Proc. IEEE CompCon Spring Conf., 374–377.
Özsu, M.T., Valduriez, P. 1991: Principles of Distributed Database Systems. Prentice Hall.
Patel, S. 1990: Performance Estimates of a Join. In: Parallel Database Systems (Proc. PRIMSA Workshop), Lecture Notes in Computer Science 503, Springer Verlag, 124–148.
Pirahesh, H.et al. 1990: Parallelism in Relational Data Base Systems: Architectural Issues and Design Approaches. In Proc. 2nd Int Symposium on Databases in Parallel and Distributed Systems, IEEE Computer Society Press.
Rahm, E.; Marek, R. 1993: Analysis of Dynamic Load Balancing for Parallel Shared Nothing Database Systems. Techn. Report, Univ. of Kaiserslautern, Dept. of Comp. Science, Febr. 1993.
Schneider, D.A., DeWitt, D.J. 1989: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. Proc. ACM SIGMOD Conf., 110–121.
Schneider, D.A., DeWitt, D.J. 1990: Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. Proc. 16th Int. Conf. on Very Large Data Bases, 469–480.
Silberschatz, A.; Stonebraker, M.; Ullman, J. 1991: Database Systems: Achievements and Opportunities. Communications of the ACM 34(10), 110–120.
Stonebraker, M. 1986: The Case for Shared Nothing. IEEE Database Engineering 9(1), 4–9.
The Tandem Database Group 1988: A Benchmark of NonStop SQL on the Debit Credit Transaction. Proc. ACM SIGMOD Conf., 337–341.
The Tandem Database Group 1989: NonStop SQL, A Distributed, High-Performance, High-Availability Implementation of SQL. Lecture Notes in Computer Science 359, Springer-Verlag, 60–104.
Walton, C.B; Dale A.G.; Jenevein, R.M. 1991: A Taxanomy and Performance Model of Data Skew Effects in Parallel Joins. Proc. 17th Int. Conf. on Very Large Data Bases, 537–548.
Watson, P., Townsend, P. 1991: The EDS Parallel Relational Database System. In: Parallel Database Systems (Proc. PRIMSA Workshop), Lecture Notes in Computer Science 503, Springer-Verlag, 149–168.
Wilschut, A.; Flokstra, J.; Apers, P. 1992: Parallelism in a Main-Memory DBMS: The performance of PRISMA/DB. Proc. 18th Int. Conf. on Very Large Data Bases, 521–532.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marek, R., Rahm, E. (1993). On the performance of parallel join processing in shared nothing database systems. In: Bode, A., Reeve, M., Wolf, G. (eds) PARLE '93 Parallel Architectures and Languages Europe. PARLE 1993. Lecture Notes in Computer Science, vol 694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56891-3_50
Download citation
DOI: https://doi.org/10.1007/3-540-56891-3_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56891-9
Online ISBN: 978-3-540-47779-2
eBook Packages: Springer Book Archive