Abstract
A simple formulation of pipelining: “Pipelining withN stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased byN” is used as the basis for a convenient and efficient treatment of pipelining in the design of application specific computers.
Pipelining according to the objective function (throughput or resource utilization) and the latency is introduced. For two polynomial complexity pipelining classes, optimal algorithms are presented. For two other classes both proofs of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed and a relationship with other transformations is explored. Due to similar formulations for both software pipelining and the pipelining presented here, all results can be easily modified for use in compilers for general purpose computers. We have also developed a polynomial complexity algorithm for determining the iteration bound.
Similar content being viewed by others
References
P.M. Kogge,The Architecture of Pipelined Computers, Washington: Hemisphere Pub. Corp.; New York: McGraw-Hill, 1981.
J.L. Hennessy and D.A. Patterson,Computer Architecture: A Quantitative Approach, San Mateo, CA.: Morgan Kaufman Publishers, 1989.
H.S. Stone,High-performance Computer Architecture, Boston, MA: Addison Wesley, 1990.
K. Hwang and F.A. Briggs,Computer Architecture and Parallel Processing, New York, NY: McGraw-Hill, 1984.
J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Fast Prototyping of Data Path Intensive Architecture,”IEEE Design and Test, Vol. 8, pp. 40–51, 1991.
A.E. Charlesworth, “An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family,”IEEE Computer, Vol. 14, pp. 18–27, 1981.
B.R. Rau, C.D. Glasser, and R.L. Pickard, “Efficient Code Generation for Horizontal Architectures: Compiler Techniques and Architectural Support,”Proc. 9th Intl. Symposium on Computer Architecture, pp. 131–134, 1982.
A. Aiken and A. Nicolau, “Perfect Pipelining: A new loop parallelisation technique,”Proc. 1988 European Symp. on Programming, pp. 221–235, 1988.
M.S. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,”ACM SIGPLAN, pp. 318–328, 1988.
K. Ebcioglu, “A Compilation Technique for Software Pipelining of Loops with Conditional Jump,”IEEE-MICRO-20, pp. 69–79, Dec. 1987.
K. Ebcioglu and A. Nicolau, “A global resourceconstrained parallelisation technique,”Proc. ACM SIGARCH ICS-89: Int. Conf. on Supercomputing, pp. 154–163, 1989.
G. Goossens, J. Wandewalle, and H. De Man, “Loop optimization in register-transfer scheduling for DSP-systems,”26th Design Automation Conference, pp. 826–831, Las Vegas, NV, 1989.
C.Y.R. Chen and M.Z. Moricz, “Data Path Scheduling for Two-Level Pipelining,”28th ACM/IEEE Design Automation Conference, pp. 603–606, 1991.
N. Park and A.C. Parker, “Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications,”IEEE Trans. on CAD, Vol. 7, pp. 356–370, 1988.
N. Park and A.C. Parker, “Theory of Clocking for Maximum Execution Overlap of High-speed Digital Systems,”IEEE Trans. on Computers, Vol. 37, pp. 678–690, 1988.
R. Jain, “High-Level Area-Delay Prediction with Application to Behavioral Synthesis,”Technical Report 89-23, University of Southern California, 1989.
M.J. Mlinar, “Control Path/Data Path Trade-offs in VLSI Design,”Technical Report 91-16, University of Southern California, 1991.
K.N. McNall and A.E. Casavant, “Automatic Operator Configuration in the Synthesis of Pipelined Architectures,”27th ACM/IEEE Design Automation Conference, pp. 174–179, 1990.
C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu, “A Formal Approach to the scheduling problem in high level synthesis,”IEEE Trans. on CAD, Vol. 10, pp. 464–475, 1991.
J.J. Kim, F.J. Kurdahi, and N. Park, “Automatic-Synthesis of Time-Stationary Controllers for Pipelined Data Paths,”IEEE International Conference on CAD, Santa Clara, CA, pp. 30–33, 1991.
R.A. Walker and R. Camposano,A Survey of High-Level Synthesis Systems, Boston, MA: Kluwer, 1990.
K.K. Parhi, C.Y. Wang, and A.P. Brown, “Synthesis of control circuits in folded pipelined DSP architectures,”IEEE Journal of Solid State Circuits, Vol. 27, pp. 29–43, 1992.
B. Gold and K.L. Jordan, “A Note on Digital Filter Synthesis,”Proc. of IEEE, pp. 1717–1718, 1968.
D. Chanoux, “A method of Digital Filter Synthesis,”M.S. Thesis, MIT, Cambridge, MA, May 1969.
C.S. Burrus, “Block Implementation of Digital Filters,”IEEE Trans. on Circuits Theory, Vol. 18, pp. 697–701, 1971.
T. Meng and D.G. Messerschmitt, “Arbitrarily high sampling rate adaptive filters,”IEEE Trans. ASSP, pp. 455–470, 1987.
K. Parhi, “Algorithm and architecture design for high speed digital signal processing,” Ph.D. Thesis, University of California, 1988.
K.K. Parhi, “Algorithm transformation technique for concurrent processors,”Proceedings of the IEEE, Vol. 77, No. 12, pp. 1879–1895. 1989.
A. Fettweis, H. Meyr, and L. Thiele, “Algorithm Transformations for Unlimited Parallelism,”IEEE International Symposium on Circuits and Systems, pp. 1756–1759, New Orleans, 1990.
H.-D. Lin and D.G. Messerschmitt, “Finite State Machine has Unlimited Concurrency,”IEEE Trans. on Circuits and Systems, Vol. 38, pp. 465–475, 1991.
H.-D. Lin, “Concurrency in Trellis Searching and Traversing Algorithms,” Ph.D. Thesis, University of California at Berkeley, 1991.
R.F. Touzeau, “A FORTRAN Compiler for the FPS-164 Scientific Computer,”ACM SIGPLAN Symposium on Compiler Construction, pp. 48–57, 1984.
M.S. Lam,A Systolic Array Optimizing Compiler, Norwell, MA: Kluwer Academic Publishers, 1989.
S. Jain, “Circular scheduling: A new technique to perform software pipelining,”ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, pp. 219–228, 1991.
B.R. Rau, M. Lee, P.P. Tirumalai, and M.S. Schlansker, “Register Allocation for Software Pipelined Loops,”ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, San Francisco, CA, pp. 283–299, 1992.
E. Gyrczyc, “Automatic Generation of Microsequenced Data Paths to Realize ADA Circuit Description,” Ph.D. Thesis, Carleton University, 1984.
N. Jouppi and D. Wall, “Available Instruction-Level Parallelism for Super-Scalar and Super-Pipelined Machines,”Proc. 3rd International Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, pp. 272–282, May 1989.
D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,”ASPLOS-IV Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, pp. 40–52.
J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Synthesis of Datapath Architectures,”Anatomy of a Silicon Compiler (R.W. Brodersen, ed.), Norwell, MA: Kluwer Academic Publishers, pp. 221–249, 1992.
P. Hilfinger and J. Rabaey, “DSP Specification Using the Silage Language,”Anatomy of a Silicon Compiler (R.W. Brodersen, ed.) pp. 199–220, Boston, MA: Kluwer Academic Publishers, 1992.
M. Potkonjak and J. Rabaey, “Scheduling Algorithms for Hierarchical Data Control Flow Graphs,”International Journal of Circuits Theory and Applications, Vol. 20, No. 3, pp. 217–234, 1992.
D. Messerschmitt, “Breaking The Recursive Bottleneck,”Performance Limits in Communication Theory and Practice (B.K. Szymanski, ed.), Norwell, MA: Kluwer Academic Publishers, 1988.
M.R. Garey and D.S. Johnson,Computers and Intractability: A Guide to the Theory of NP-Completeness, New York, NY:W.H. Freeman and Company, 1979.
M. Potkonjak and J. Rabaey, “Retiming for Scheduling,”VLSI Signal Processing Workshop, San Diego, CA, Vol. IV, pp. 23–32, IEEE Press, 1990.
M. Potkonjak, “Algorithms for High Level Synthesis: Resource Utilization Based Approach,” Ph.D. Dissertation, University of California, Berkeley, 1991.
C.E. Leiserson, F.M. Rose, and J.B. Saxe, “Optimizing synchronous circuits by retiming,”Proceedings of the Third Conference on VLSI, pp. 23–36, Computer Science Press, 1983.
G. Goossens, R. Jain, J. Vandewalle, and H. De Man, “An optimal and flexible delay management technique for VLSI,”Computation and Combinational methods in System Theory (C.I. Byrnes, A. Lindquist, eds.) pp. 409–418, North Holland, 1986.
C.E. Leiserson and J.B. Saxe, “Retiming Synchronous Circuitry,”Algorithmica, Vol. 6, No. 1, pp. 5–35, 1991.
S.Y. Kung,VLSI Array Processors, Englewood Cliffs, NJ: Prentice Hall, 1988.
G.B. Danzig, W. Blattner, and M.R. Rao, “Finding a cycle in a graph with minimum cost to time ratio with application to a ship routing problem,”Theory of Graphs (P. Rosenstiehl, Ed.), pp. 77–84, New York, NY: Dunod, Paris and Gordon and Breach, 1967.
S. Gerez, S. Heemstra de Groot, and O. Hermann, “A polynomial time algorithm for the computation of the iteration bound in recursive data flow graphs,”IEEE Trans. on Circuits and Systems, Part I: Fundamental Theory and Applications, Vol. 39, pp. 49–52, 1992.
S.M. Heemstra de Groot, S.H. Gerez, and O.E. Herrman, “Range-Chart-Guided Iterative Data-Flow Graph Scheduling,”IEEE Trans. on Circuits and Systems: Fundamental Theory and Applications, Vol. 39, pp. 351–364, 1992.
K.K. Parhi and D.G. Messerschmitt, “Static rate-optimal scheduling of iterative data flow graphs via optimal unfolding,”IEEE Trans. on Computers, Vol. 40, pp. 178–195, 1991.
M.C. McFarland, A.C. Parker, and R. Camposano, “Tutorial on High-Level Synthesis,”Proceedings of the 25th Design Automation Conference, Anaheim, CA, pp. 330–336, June 1988.
B. Efron,The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia, PA: SIAM, 1982.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone,Classification and Regression Trees, Monterey, CA: Wadsworth & Brooks/Cole, 1984.
B.L. Van der Warden,Modern Algebra, New York, NY: Frederick Ungar, 1950.
M. Potkonjak and J. Rabaey, “Optimizing the Resource Utilization Using Transformations,”Proc. IEEE ICCAD Conference, Santa Clara, 1991.
M. Potkonjak and J. Rabaey, “Pipelining: Just Another Transformation,”Proceedings 1992 Application Specific Array Processors, Oakland, CA, IEEE Computer Society Press, pp. 163–177, 1992.
M. Potkonjak and J. Rabaey, “Pipelining: Just Another Transformation,” Technical Report 5510-92-01, NEC USA, Princeton, 1992.
L.-F. Chao, A. LaPaugh, and E.H-M. Sha, “Rotation Scheduling: A Loop Pipelining Algorithm,”30th ACM/ IEEE Design Automation Conference, pp. 566–572, 1993.
Author information
Authors and Affiliations
Additional information
This work was done while the first author was at the University of California, Berkeley.
Rights and permissions
About this article
Cite this article
Potkonjak, M., Rabaey, J. Optimizing throughput and resource utilization using pipelining: Transformation based approach. Journal of VLSI Signal Processing 8, 117–130 (1994). https://doi.org/10.1007/BF02109380
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02109380