Abstract
In order to reduce the elapsed time of a computation, a popular approach is to decompose the program into a collection of largely independent subtasks which are executed in parallel. Unfortunately, it is often observed that tightly-coupled parallel programs run considerably slower than initially expected. In this paper, a framework for the analysis of parallel programs and their potential speedup is presented. Two parameters which strongly affect the scalability of parallelism are identified, namely the grain of synchronization, and the degree to which the target hardware is available. It is shown that for certain classes of applications speedup is inherently poor, even if the program runs under the idealized conditions of perfect load balance, unbounded communication bandwidth and negligible communication and parallelization overhead. Upper bounds are derived for the speedup that can be obtained in three different types of computations. An example illustrates the main findings.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahamad M., John R., Kohli P., and Neiger G.: Causal Memory Meets the Consistency and Performance Needs of Distributed Applications. Proc. 6th ACM SIGOPS European Workshop, Dagstuhl Castle, Wadern, Germany (Sept. 1994) 45–50
Assenmacher H., Breitbach T., Buhler P., Hübsch V., Peine H., and Schwarz R.: Parallel Programming in Panda. Journal of Supercomputing, Special Issue on Trends in Parallel Operating Systems (to appear)
Bal H.E., Kaashoek F., and Tanenbaum A.S.: Orca: A Language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Engineering, Vol. 18, No. 3 (March 1992) 190–205
Eager D.L., Zahorjan J., and Lazowska E.D.: Speedup Versus Efficiency in Parallel Systems. IEEE Transactions on Computers, Vol. 38, No. 3 (March 1989) 408–423
Engel, A.: Wahrscheinlichkeitsrechnung und Statistik, Band 2. Ernst Klett Verlag, Stuttgart 1976, 178–179
Feeley M.J., Bershad B.N., Chase J.S., and Levy H.M.: Dynamic Node Reconfiguration in a Parallel-Distributed Environment. Proc. 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM SIGPLAN Notices, Vol. 26, No. 7 (July 1991) 114–121
Feitelson D.G. and Rudolph L.: Gang Scheduling Performance Benefits for FineGrain Synchronization. Journal of Parallel and Distributed Computing, Vol. 16, No. 4 (Dec. 1992) 306–318
Rao V.N. and Kumar V.: On the Efficiency of Parallel Backtracking. IEEE Transactions on Parallel and Distributed Systems, Vol.4, No. 4 (April 1993) 427–437
Saavedra-Barrera R.H., Culler D.E., and von Eicken T.: Analysis of Multithreaded Architectures for Parallel Computing. Proc. 2nd Annual ACM Symposium on Parallel Algorithms and Architectures, Island of Crete, Greece (Juli 1990) 169–178
Zahorjan J., Lazowska E.D., and Eager D.L.: The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 2 (April 1991) 180–198
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schwarz, R. (1995). Speedup limits for tightly-coupled parallel computations. In: Birman, K.P., Mattern, F., Schiper, A. (eds) Theory and Practice in Distributed Systems. Lecture Notes in Computer Science, vol 938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60042-6_17
Download citation
DOI: https://doi.org/10.1007/3-540-60042-6_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60042-8
Online ISBN: 978-3-540-49409-6
eBook Packages: Springer Book Archive