Skip to main content

Determining asynchronous pipeline execution times

  • Compiler Algorithms for Fine-Grain Parallelism
  • Conference paper
  • First Online:
Book cover Languages and Compilers for Parallel Computing (LCPC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Abstract

Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems.

Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure.

We generalize previous work on determining if a pipeline schedule will deadlock, and generalize Reiter's well-known formula [19] for determining the iteration interval b of a deadlock-free schedule, to account for nonzero communication times (easy) and the assignment of multiple tasks to processors (nontrivial). Two key components of our generalization are the use of pipeline scheduling edges, and the notion of negative data dependence distances (in a single unnested loop). We also discuss implementation of an asynchronous pipeline schedule at runtime; derive bounds on the startup time a; and discuss evaluation of the iteration interval formula, including development of a new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexander Aiken and Alexandru Nicolau. Optimal loop parallelization. Proc. SIG-PLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 308–317.

    Google Scholar 

  2. Sati Banerjee, Takeo Hamada, Paul M. Chau, and Ronald D. Fellman. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems. IEEE Transactions on Signal Processing 43:8 (June 1995), pp. 1468–1484.

    Article  Google Scholar 

  3. Steven M. Burns. Performance analysis and optimization of asynchronous circuits. Ph.D. Thesis, California Institute of Technology, Pasadena, California, 1991.

    Google Scholar 

  4. F. Commoner, A. W. Holt, S. Even, and A. Pnueli. Marked directed graphs. Journal of Computer and System Sciences 5:5 (October 1971), pp. 511–523.

    Google Scholar 

  5. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  6. Val Donaldson and Jeanne Ferrante. Determining asynchronous acyclic pipeline execution times. Proc. 10th International Parallel Processing Symposium, Honolulu, HI, April 1996, pp. 568–572.

    Google Scholar 

  7. Val Donaldson and Jeanne Ferrante. Determining asynchronous pipeline execution times. Technical Report CS96-481, Computer Science and Engineering Dept., University of California, San Diego, La Jolla, CA, April 1996.

    Google Scholar 

  8. Franco Gasperoni and Uwe Schwiegeishohn. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. Second Joint International Conference on Vector and Parallel Processing (Parallel Processing: CON-PAR 92-VAPP V), Lyon, France, September 1992, pp. 625–636.

    Google Scholar 

  9. Mark Hartmann and James B. Orlin. Finding minimum cost to time ratio cycles with small integral transit times. Networks 23:6 (September 1993), pp. 567–74.

    Google Scholar 

  10. Phu D. Hoang and Jan M. Rabaey. Scheduling of DSP programs onto multiprocessors for maximum throughput. IEEE Transactions on Signal Processing 41:6 (June 1993), pp. 2225–2235.

    Article  Google Scholar 

  11. Donald B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing 4:1 (March 1975), pp. 77–84.

    Article  Google Scholar 

  12. Peter M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing, New York, 1981.

    Google Scholar 

  13. S. Y. Kung, P. S. Lewis, and S. C. Lo. Performance analysis and optimization of VLSI dataflow arrays. Journal of Parallel and Distributed Computing 4:6 (December 1987), pp. 592–618.

    Article  Google Scholar 

  14. Monica Lam. Software pipelining: an effective scheduling technique for VLIW machines. Proc. SIGPLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 318–328.

    Google Scholar 

  15. Eugene L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, New York, 1976.

    Google Scholar 

  16. F. Thomson Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, San Mateo, CA, 1992.

    Google Scholar 

  17. David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM 29:12 (December 1986), pp. 1184–1201.

    Article  Google Scholar 

  18. C. V. Ramamoorthy and Gary S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Transactions on Software Engineering SE-6:5 (September 1980), pp. 440–449.

    Google Scholar 

  19. Raymond Reiter. Scheduling parallel computations. Journal of the ACM 15:4 (October 1968), pp. 590–599.

    Article  Google Scholar 

  20. Vivek Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, 1989.

    Google Scholar 

  21. Tao Yang, Cong Fu, Apostolos Gerasoulis, and Vivek Sarkar. Mapping iterative task graphs on distributed memory machines. Proc. 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995, Vol II, pp. 151–158.

    Google Scholar 

  22. Tao Yang and Apostolos Gerasoulis. DSC: scheduling parallel tasks on an unbounded number of processors. IEEE Transactions on Parallel and Distributed Systems 5:9 (September 1994), pp. 951–967.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Donaldson, V., Ferrante, J. (1997). Determining asynchronous pipeline execution times. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017251

Download citation

  • DOI: https://doi.org/10.1007/BFb0017251

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63091-3

  • Online ISBN: 978-3-540-69128-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics