Skip to main content
Log in

Lower bounds on the iteration time and the initiation interval of functional pipelining and loop folding

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

The performance of pipelined datapath implementations is measured basically by three parameters: the clock cycle length, the initiation interval between successive iterations (inverse of the throughput) and the iteration time (turn-around time). In this paper we present a new method for computing performance bounds of pipelined implementations:

  • • Given an iterative behavior, a set of resource constraints and a target initiation interval, we derive a lower bound on the iteration time achievable by any pipelined implementation.

  • • Given an iterative behavior and a set of resource constraints, we derive a lower bound on the initiation interval achievable by any pipelined implementation.

The method has a low complexity and it handles behavioral specifications containing loop statements with interiteration data dependency and timing constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Park and A. C. Parker. Sehwa: A software package for synthesis of pipelines from behavioral specifications. IEEE Trans, on CAD 7: 356–370, March 1988.

    Google Scholar 

  2. C. T. Hwang, Y. C. Hsu, and Y. L. Lin. PLS: A scheduler for pipeline synthesis. IEEE Trans, on CAD 12: 1279–1286, September 1993.

    Google Scholar 

  3. R. R. Potasman, J. Lis, A. Nicolau, and D. Gajski. Percolation based synthesis. In Proceedings of the 27th Design Automation Conference, pp. 444–449. 1990.

  4. G. Goossens, J. Rabaey, J. Vandewalle, and H. D. Man. An efficient microcode compiler for application specific DSP processors. IEEE Trans. on CAD 9, September 1990.

  5. C.-Y. Wang and K. K. Parhi. Loop list scheduler for DSP algorithms under resource constraints. In Proceedings of ISCAS, pp. 1662–1665. 1993.

  6. T. F. Lee, A. C. H. Wu, and D. D. Gajski. A transformation-based method for loop folding. IEEE Trans. an CAD 13(4): 439–450, April 1994.

    Google Scholar 

  7. L. F. Chao and A. LaPaugh. Rotation scheduling: a loop pipelining algorithm, in Proceedings of the 30th Design Automation Conference, pp. 566–572. 1993.

  8. M. Sonia, H. Gerez, and E. H. Otto. Range-chart-guided iterative data-flow graph Scheduling. IEEE Trans. on Circuits and Systems 39: 351–364, May 1992.

    Google Scholar 

  9. P. G. Paulin and J. P. Knight. Force-directed scheduling for the behavioral synthesis of ASIC's. IEEE Trans. on CAD 8, June 1989.

  10. I.-C. Park and C.-M. Kyung. FAMOS: An efficient scheduling algorithm for high-level synthesis. IEEE Trans. on CAD 12: 1437–1448, October 1993.

    Google Scholar 

  11. A. Aiken and A. Nicolau. A realistic resource-constrained software pipelining algorithm. Advances in Languages and Compilers for Parallel Processing, MIT Press, 1991.

  12. C. T. Hwang, J. H. Lee, and Y. C. Hsu. A formal approach to the scheduling problem in high level synthesis. IEEE Trans. on CAD 10: 669–683, April 1991.

    Google Scholar 

  13. C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica 6: 5–35, 1991.

    Google Scholar 

  14. A. Sharma and R. Jain. Estimation architectural resources and performance for high-level synthesis applications. IEEE Trans. on VLSI 1: 175–190, 1993.

    Google Scholar 

  15. J. Rabaey and M. Potkonjak. Estimating implementation bounds for real time DSP application specific circuits. IEEE Trans. on CAD 13, June 1994.

  16. M. Langevin and E. Cerny. A recursive technique for computing lower-bound performance of schedules. In Proceedings of ICCD, 1993, pp. 16–20.

  17. M. Rim and R. Jain. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. on CAD 13: 81–88, 1994.

    Google Scholar 

  18. T. C. Hu. Parallel sequencing and assembly line problems. Operations Research 9: 841–848, 1961.

    Google Scholar 

  19. M. A. Al-Mouhamed. Lower bound on the number of processors and time for scheduling precedence graphs with communication cost. IEEE Trans. on Software Engineering 16, December 1990.

  20. S. Chaudhuri and R. A. Walker. Computing lower bounds on functional units before scheduling. In Proceedings of the 7th Intern. Symp. on High Level Synthesis, pp. 36–41. 1994.

  21. R. Reiter. Scheduling parallel computation. Journal of ACM 15: 590–599, 1968.

    Google Scholar 

  22. R. Jain, A. C. Parker, and N. Parker. Predicting system-level area and delay for pipelined and non-pipelined designs. IEEE Trans. on CAD 11, August 1992.

  23. Y. Hu, A. Ghouse, and B. S. Carlson. Lower bounds on the iteration time and the number of resources for functional pipelined data flow graphs. In Proceedings of ICCD, pp. 21–24. 1993.

  24. S. H. Gerez, S. M. H de Groot, and O. E. Herrmann. A polynomial-time algorithm for the computation of the iterative-period bound in recursive data-flow graphs. IEEE Trans. on Circuits and System 39: 49–51, January 1992.

    Google Scholar 

  25. A. Aiken and A. Nicolau. Optimal loop parallelization. SIGPLAN 23(7): 308–317,1988.

    Google Scholar 

  26. K. Iwano and S. Yeh. An efficient algorithm for optimal loop parallelization. Algorithms, Lect. Notes in Computer Science 450: 202–210, 1990.

    Google Scholar 

  27. A. Zaky and P. Sadayappan. Optimal static scheduling of sequential loops on multiprocessors. In Proceedings of the Intern. Confer. on Parallel Processing, Vol. III, pp. 130–137. 1989.

    Google Scholar 

  28. M. Potkonjak and J. Rabney. Optimizing throughput and resource utilization using pipelining: transformation based approach. Journal of VLSI Signal Processing 8: 117–130, 1994.

    Google Scholar 

  29. K. K. Parhi and D. G. Messerschmitt. Static rate-optimal scheduling of iterative data-flow program via optimum unfolding. IEEE Trans. on Computers 40: 178–194, February 1991.

    Google Scholar 

  30. L.-G. Jeng and L.-G. Chen. Rate-optimal DSP synthesis by pipeline and minimum unfolding. IEEE Trans. on VLSI 2: 81–88, March 1994.

    Google Scholar 

  31. L. E. Lucke and K. K. Parhi. Data-flow transformations for critical path time reduction in high level DSP synthesis. IEEE Trans. on CAD 12: 1063–1068, July 1993.

    Google Scholar 

  32. G. De Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994.

  33. M. Potkonjak and J. Rabaey. Optimizing resource utilization using transformations. IEEE Trans. on CAD 13: 277–292, March 1994.

    Google Scholar 

  34. M. B. Srivatava and M. Potkonjak. Transforming linear systems for joint latency and throughput optimization. In Proceedings of of EDAC, 1994, pp. 267–271.

  35. C. Y. Chen and M. Morca. A delay distribution methodology for the optimal systolic synthesis of linear recurrence algorithms. IEEE Trans. on CAD 10: 685–697, June 1991.

    Google Scholar 

  36. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press and McGraw Hill, 1990.

  37. S. Y. Kung, H. J. Witehouse, and T. Kailath. VLSI and Modern Signal Processing. Prentice Hall, pp. 258–264, 1985.

  38. R. Roundy. Cyclic schedules for job shops with identical jobs. Mathematics of Operations Research 17(4): 842–865, November 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bennour, I.E., Albouhamid, E.M. Lower bounds on the iteration time and the initiation interval of functional pipelining and loop folding. Des Autom Embed Syst 1, 333–355 (1996). https://doi.org/10.1007/BF00209909

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00209909

Keywords

Navigation