Determining asynchronous pipeline execution times

Donaldson, Val; Ferrante, Jeanne

doi:10.1007/BFb0017251

Val Donaldson¹ &
Jeanne Ferrante¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

124 Accesses
1 Citations

Abstract

Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems.

Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure.

We generalize previous work on determining if a pipeline schedule will deadlock, and generalize Reiter's well-known formula [19] for determining the iteration interval b of a deadlock-free schedule, to account for nonzero communication times (easy) and the assignment of multiple tasks to processors (nontrivial). Two key components of our generalization are the use of pipeline scheduling edges, and the notion of negative data dependence distances (in a single unnested loop). We also discuss implementation of an asynchronous pipeline schedule at runtime; derive bounds on the startup time a; and discuss evaluation of the iteration interval formula, including development of a new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alexander Aiken and Alexandru Nicolau. Optimal loop parallelization. Proc. SIG-PLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 308–317.
Google Scholar
Sati Banerjee, Takeo Hamada, Paul M. Chau, and Ronald D. Fellman. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems. IEEE Transactions on Signal Processing 43:8 (June 1995), pp. 1468–1484.
Article Google Scholar
Steven M. Burns. Performance analysis and optimization of asynchronous circuits. Ph.D. Thesis, California Institute of Technology, Pasadena, California, 1991.
Google Scholar
F. Commoner, A. W. Holt, S. Even, and A. Pnueli. Marked directed graphs. Journal of Computer and System Sciences 5:5 (October 1971), pp. 511–523.
Google Scholar
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
Google Scholar
Val Donaldson and Jeanne Ferrante. Determining asynchronous acyclic pipeline execution times. Proc. 10th International Parallel Processing Symposium, Honolulu, HI, April 1996, pp. 568–572.
Google Scholar
Val Donaldson and Jeanne Ferrante. Determining asynchronous pipeline execution times. Technical Report CS96-481, Computer Science and Engineering Dept., University of California, San Diego, La Jolla, CA, April 1996.
Google Scholar
Franco Gasperoni and Uwe Schwiegeishohn. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. Second Joint International Conference on Vector and Parallel Processing (Parallel Processing: CON-PAR 92-VAPP V), Lyon, France, September 1992, pp. 625–636.
Google Scholar
Mark Hartmann and James B. Orlin. Finding minimum cost to time ratio cycles with small integral transit times. Networks 23:6 (September 1993), pp. 567–74.
Google Scholar
Phu D. Hoang and Jan M. Rabaey. Scheduling of DSP programs onto multiprocessors for maximum throughput. IEEE Transactions on Signal Processing 41:6 (June 1993), pp. 2225–2235.
Article Google Scholar
Donald B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing 4:1 (March 1975), pp. 77–84.
Article Google Scholar
Peter M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing, New York, 1981.
Google Scholar
S. Y. Kung, P. S. Lewis, and S. C. Lo. Performance analysis and optimization of VLSI dataflow arrays. Journal of Parallel and Distributed Computing 4:6 (December 1987), pp. 592–618.
Article Google Scholar
Monica Lam. Software pipelining: an effective scheduling technique for VLIW machines. Proc. SIGPLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 318–328.
Google Scholar
Eugene L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, New York, 1976.
Google Scholar
F. Thomson Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, San Mateo, CA, 1992.
Google Scholar
David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM 29:12 (December 1986), pp. 1184–1201.
Article Google Scholar
C. V. Ramamoorthy and Gary S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Transactions on Software Engineering SE-6:5 (September 1980), pp. 440–449.
Google Scholar
Raymond Reiter. Scheduling parallel computations. Journal of the ACM 15:4 (October 1968), pp. 590–599.
Article Google Scholar
Vivek Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, 1989.
Google Scholar
Tao Yang, Cong Fu, Apostolos Gerasoulis, and Vivek Sarkar. Mapping iterative task graphs on distributed memory machines. Proc. 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995, Vol II, pp. 151–158.
Google Scholar
Tao Yang and Apostolos Gerasoulis. DSC: scheduling parallel tasks on an unbounded number of processors. IEEE Transactions on Parallel and Distributed Systems 5:9 (September 1994), pp. 951–967.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, San Diego
Val Donaldson & Jeanne Ferrante

Authors

Val Donaldson
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Ferrante
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Donaldson, V., Ferrante, J. (1997). Determining asynchronous pipeline execution times. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017251

Download citation

DOI: https://doi.org/10.1007/BFb0017251
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics