Skip to main content
Log in

Optimizing throughput and resource utilization using pipelining: Transformation based approach

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

A simple formulation of pipelining: “Pipelining withN stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased byN” is used as the basis for a convenient and efficient treatment of pipelining in the design of application specific computers.

Pipelining according to the objective function (throughput or resource utilization) and the latency is introduced. For two polynomial complexity pipelining classes, optimal algorithms are presented. For two other classes both proofs of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed and a relationship with other transformations is explored. Due to similar formulations for both software pipelining and the pipelining presented here, all results can be easily modified for use in compilers for general purpose computers. We have also developed a polynomial complexity algorithm for determining the iteration bound.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.M. Kogge,The Architecture of Pipelined Computers, Washington: Hemisphere Pub. Corp.; New York: McGraw-Hill, 1981.

    MATH  Google Scholar 

  2. J.L. Hennessy and D.A. Patterson,Computer Architecture: A Quantitative Approach, San Mateo, CA.: Morgan Kaufman Publishers, 1989.

    Google Scholar 

  3. H.S. Stone,High-performance Computer Architecture, Boston, MA: Addison Wesley, 1990.

    Google Scholar 

  4. K. Hwang and F.A. Briggs,Computer Architecture and Parallel Processing, New York, NY: McGraw-Hill, 1984.

    MATH  Google Scholar 

  5. J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Fast Prototyping of Data Path Intensive Architecture,”IEEE Design and Test, Vol. 8, pp. 40–51, 1991.

    Article  Google Scholar 

  6. A.E. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family,”IEEE Computer, Vol. 14, pp. 18–27, 1981.

    Article  Google Scholar 

  7. B.R. Rau, C.D. Glasser, and R.L. Pickard, “Efficient Code Generation for Horizontal Architectures: Compiler Techniques and Architectural Support,”Proc. 9th Intl. Symposium on Computer Architecture, pp. 131–134, 1982.

  8. A. Aiken and A. Nicolau, “Perfect Pipelining: A new loop parallelisation technique,”Proc. 1988 European Symp. on Programming, pp. 221–235, 1988.

  9. M.S. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,”ACM SIGPLAN, pp. 318–328, 1988.

  10. K. Ebcioglu, “A Compilation Technique for Software Pipelining of Loops with Conditional Jump,”IEEE-MICRO-20, pp. 69–79, Dec. 1987.

  11. K. Ebcioglu and A. Nicolau, “A global resourceconstrained parallelisation technique,”Proc. ACM SIGARCH ICS-89: Int. Conf. on Supercomputing, pp. 154–163, 1989.

  12. G. Goossens, J. Wandewalle, and H. De Man, “Loop optimization in register-transfer scheduling for DSP-systems,”26th Design Automation Conference, pp. 826–831, Las Vegas, NV, 1989.

  13. C.Y.R. Chen and M.Z. Moricz, “Data Path Scheduling for Two-Level Pipelining,”28th ACM/IEEE Design Automation Conference, pp. 603–606, 1991.

  14. N. Park and A.C. Parker, “Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications,”IEEE Trans. on CAD, Vol. 7, pp. 356–370, 1988.

    Article  Google Scholar 

  15. N. Park and A.C. Parker, “Theory of Clocking for Maximum Execution Overlap of High-speed Digital Systems,”IEEE Trans. on Computers, Vol. 37, pp. 678–690, 1988.

    Article  Google Scholar 

  16. R. Jain, “High-Level Area-Delay Prediction with Application to Behavioral Synthesis,”Technical Report 89-23, University of Southern California, 1989.

  17. M.J. Mlinar, “Control Path/Data Path Trade-offs in VLSI Design,”Technical Report 91-16, University of Southern California, 1991.

  18. K.N. McNall and A.E. Casavant, “Automatic Operator Configuration in the Synthesis of Pipelined Architectures,”27th ACM/IEEE Design Automation Conference, pp. 174–179, 1990.

  19. C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu, “A Formal Approach to the scheduling problem in high level synthesis,”IEEE Trans. on CAD, Vol. 10, pp. 464–475, 1991.

    Article  Google Scholar 

  20. J.J. Kim, F.J. Kurdahi, and N. Park, “Automatic-Synthesis of Time-Stationary Controllers for Pipelined Data Paths,”IEEE International Conference on CAD, Santa Clara, CA, pp. 30–33, 1991.

  21. R.A. Walker and R. Camposano,A Survey of High-Level Synthesis Systems, Boston, MA: Kluwer, 1990.

    Google Scholar 

  22. K.K. Parhi, C.Y. Wang, and A.P. Brown, “Synthesis of control circuits in folded pipelined DSP architectures,”IEEE Journal of Solid State Circuits, Vol. 27, pp. 29–43, 1992.

    Article  Google Scholar 

  23. B. Gold and K.L. Jordan, “A Note on Digital Filter Synthesis,”Proc. of IEEE, pp. 1717–1718, 1968.

  24. D. Chanoux, “A method of Digital Filter Synthesis,”M.S. Thesis, MIT, Cambridge, MA, May 1969.

    Google Scholar 

  25. C.S. Burrus, “Block Implementation of Digital Filters,”IEEE Trans. on Circuits Theory, Vol. 18, pp. 697–701, 1971.

    Article  Google Scholar 

  26. T. Meng and D.G. Messerschmitt, “Arbitrarily high sampling rate adaptive filters,”IEEE Trans. ASSP, pp. 455–470, 1987.

  27. K. Parhi, “Algorithm and architecture design for high speed digital signal processing,” Ph.D. Thesis, University of California, 1988.

  28. K.K. Parhi, “Algorithm transformation technique for concurrent processors,”Proceedings of the IEEE, Vol. 77, No. 12, pp. 1879–1895. 1989.

    Article  Google Scholar 

  29. A. Fettweis, H. Meyr, and L. Thiele, “Algorithm Transformations for Unlimited Parallelism,”IEEE International Symposium on Circuits and Systems, pp. 1756–1759, New Orleans, 1990.

  30. H.-D. Lin and D.G. Messerschmitt, “Finite State Machine has Unlimited Concurrency,”IEEE Trans. on Circuits and Systems, Vol. 38, pp. 465–475, 1991.

    Article  Google Scholar 

  31. H.-D. Lin, “Concurrency in Trellis Searching and Traversing Algorithms,” Ph.D. Thesis, University of California at Berkeley, 1991.

    Google Scholar 

  32. R.F. Touzeau, “A FORTRAN Compiler for the FPS-164 Scientific Computer,”ACM SIGPLAN Symposium on Compiler Construction, pp. 48–57, 1984.

  33. M.S. Lam,A Systolic Array Optimizing Compiler, Norwell, MA: Kluwer Academic Publishers, 1989.

    Google Scholar 

  34. S. Jain, “Circular scheduling: A new technique to perform software pipelining,”ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, pp. 219–228, 1991.

  35. B.R. Rau, M. Lee, P.P. Tirumalai, and M.S. Schlansker, “Register Allocation for Software Pipelined Loops,”ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, San Francisco, CA, pp. 283–299, 1992.

  36. E. Gyrczyc, “Automatic Generation of Microsequenced Data Paths to Realize ADA Circuit Description,” Ph.D. Thesis, Carleton University, 1984.

  37. N. Jouppi and D. Wall, “Available Instruction-Level Parallelism for Super-Scalar and Super-Pipelined Machines,”Proc. 3rd International Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, pp. 272–282, May 1989.

  38. D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,”ASPLOS-IV Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, pp. 40–52.

  39. J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Synthesis of Datapath Architectures,”Anatomy of a Silicon Compiler (R.W. Brodersen, ed.), Norwell, MA: Kluwer Academic Publishers, pp. 221–249, 1992.

  40. P. Hilfinger and J. Rabaey, “DSP Specification Using the Silage Language,”Anatomy of a Silicon Compiler (R.W. Brodersen, ed.) pp. 199–220, Boston, MA: Kluwer Academic Publishers, 1992.

    Chapter  Google Scholar 

  41. M. Potkonjak and J. Rabaey, “Scheduling Algorithms for Hierarchical Data Control Flow Graphs,”International Journal of Circuits Theory and Applications, Vol. 20, No. 3, pp. 217–234, 1992.

    Article  Google Scholar 

  42. D. Messerschmitt, “Breaking The Recursive Bottleneck,”Performance Limits in Communication Theory and Practice (B.K. Szymanski, ed.), Norwell, MA: Kluwer Academic Publishers, 1988.

    Google Scholar 

  43. M.R. Garey and D.S. Johnson,Computers and Intractability: A Guide to the Theory of NP-Completeness, New York, NY:W.H. Freeman and Company, 1979.

    MATH  Google Scholar 

  44. M. Potkonjak and J. Rabaey, “Retiming for Scheduling,”VLSI Signal Processing Workshop, San Diego, CA, Vol. IV, pp. 23–32, IEEE Press, 1990.

    Google Scholar 

  45. M. Potkonjak, “Algorithms for High Level Synthesis: Resource Utilization Based Approach,” Ph.D. Dissertation, University of California, Berkeley, 1991.

    Google Scholar 

  46. C.E. Leiserson, F.M. Rose, and J.B. Saxe, “Optimizing synchronous circuits by retiming,”Proceedings of the Third Conference on VLSI, pp. 23–36, Computer Science Press, 1983.

  47. G. Goossens, R. Jain, J. Vandewalle, and H. De Man, “An optimal and flexible delay management technique for VLSI,”Computation and Combinational methods in System Theory (C.I. Byrnes, A. Lindquist, eds.) pp. 409–418, North Holland, 1986.

  48. C.E. Leiserson and J.B. Saxe, “Retiming Synchronous Circuitry,”Algorithmica, Vol. 6, No. 1, pp. 5–35, 1991.

    Article  MathSciNet  Google Scholar 

  49. S.Y. Kung,VLSI Array Processors, Englewood Cliffs, NJ: Prentice Hall, 1988.

    Google Scholar 

  50. G.B. Danzig, W. Blattner, and M.R. Rao, “Finding a cycle in a graph with minimum cost to time ratio with application to a ship routing problem,”Theory of Graphs (P. Rosenstiehl, Ed.), pp. 77–84, New York, NY: Dunod, Paris and Gordon and Breach, 1967.

    Google Scholar 

  51. S. Gerez, S. Heemstra de Groot, and O. Hermann, “A polynomial time algorithm for the computation of the iteration bound in recursive data flow graphs,”IEEE Trans. on Circuits and Systems, Part I: Fundamental Theory and Applications, Vol. 39, pp. 49–52, 1992.

    Article  Google Scholar 

  52. S.M. Heemstra de Groot, S.H. Gerez, and O.E. Herrman, “Range-Chart-Guided Iterative Data-Flow Graph Scheduling,”IEEE Trans. on Circuits and Systems: Fundamental Theory and Applications, Vol. 39, pp. 351–364, 1992.

    Article  MATH  Google Scholar 

  53. K.K. Parhi and D.G. Messerschmitt, “Static rate-optimal scheduling of iterative data flow graphs via optimal unfolding,”IEEE Trans. on Computers, Vol. 40, pp. 178–195, 1991.

    Article  Google Scholar 

  54. M.C. McFarland, A.C. Parker, and R. Camposano, “Tutorial on High-Level Synthesis,”Proceedings of the 25th Design Automation Conference, Anaheim, CA, pp. 330–336, June 1988.

  55. B. Efron,The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia, PA: SIAM, 1982.

    Book  Google Scholar 

  56. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone,Classification and Regression Trees, Monterey, CA: Wadsworth & Brooks/Cole, 1984.

    MATH  Google Scholar 

  57. B.L. Van der Warden,Modern Algebra, New York, NY: Frederick Ungar, 1950.

    Google Scholar 

  58. M. Potkonjak and J. Rabaey, “Optimizing the Resource Utilization Using Transformations,”Proc. IEEE ICCAD Conference, Santa Clara, 1991.

  59. M. Potkonjak and J. Rabaey, “Pipelining: Just Another Transformation,”Proceedings 1992 Application Specific Array Processors, Oakland, CA, IEEE Computer Society Press, pp. 163–177, 1992.

    Google Scholar 

  60. M. Potkonjak and J. Rabaey, “Pipelining: Just Another Transformation,” Technical Report 5510-92-01, NEC USA, Princeton, 1992.

    Google Scholar 

  61. L.-F. Chao, A. LaPaugh, and E.H-M. Sha, “Rotation Scheduling: A Loop Pipelining Algorithm,”30th ACM/ IEEE Design Automation Conference, pp. 566–572, 1993.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was done while the first author was at the University of California, Berkeley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Potkonjak, M., Rabaey, J. Optimizing throughput and resource utilization using pipelining: Transformation based approach. Journal of VLSI Signal Processing 8, 117–130 (1994). https://doi.org/10.1007/BF02109380

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02109380

Keywords

Navigation