Skip to main content
Log in

Abstract

The throughput of a parallel execution of a DSP algorithm is limited by the iteration bound, which is the minimum period between the starts of consecutive iterations. It is given byT i∞=max (T i/D i), whereT i andD i are the total time of operations and the number of delays in loopi, respectively. The execution throughput of a DSP algorithm can be increased by reducing theT is, and this reduction can be realized by taking as many operations as possible out of loops without changing the semantic of the calculation. Since many DSP algorithms extensively use the four basic arithmetic operations, a simple and effective way of doing this reduction is to apply commutativity, associativity and distributivity on these operations. This paper presents an optimization technique, calledLoop Shrinking, which reduces the iteration bound by using the above method. Loop Shrinking is based on a heuristic method which is time-efficient for simple cases but can also tackle complex examples. An implementation of Loop Shrinking is presented in this article. The results show that it can yield a reduction in the iteration bound near or equal to careful hand-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A.V. Aho, R. Sethi and J.D. Ullman, Compilers: Principles, Techniques and Tools, Reading, MA: Addison-Wesley, 1988.

    Google Scholar 

  2. G. De Michelli and D.C. Ku, “HERCULES—a system for high-level synthesis,”Proc. of the 25th ACM/IEEE Design Automation Conference, 1988, pp. 483–488.

  3. T. Tanaka, T. Kobayashi and O. Karatsu, “HARP: fortran to silicon,”IEEE Transactions on Computer-Aided Design, vol. 8, 1989. pp. 649–660.

    Article  Google Scholar 

  4. S.H. Lee and T.P. Barnwell III, “Optimal multiprocessor implementations from a serial algorithm specification,”Proc. ICASSP-88, vol. 3, 1988, pp. 1694–1697.

    Google Scholar 

  5. J. Bhasker and H. Lee, “An optimizer for hardware synthesis,”IEEE Design and Test of Computers, vol. 7, 1990, pp. 20–36.

    Article  Google Scholar 

  6. R. Camposano and W. Rosenstiel, “Synthesizing circuits from behavioral descriptions,”IEEE Transactions on Computer-Aided Design, vol. 8, 1989, pp. 171–180.

    Article  Google Scholar 

  7. H. Trickey, “Flamel: A high-level hardware compiler,”IEEE Transactions on Computer-Aided Design, vol. CAD-6, 1987, pp. 259–269.

    Article  Google Scholar 

  8. W. Rosenstiel,Optimizations in High Level Synthesis, Microprocessing and Microprogramming (18), North Holland, 1986, pp. 347–352.

  9. R. A. Walker and D.E. Thomas, “Design representation and transformation in The System Architect's Workbench,”Proc. of ICCAD-87, 1987, pp. 166–169.

  10. M.C. McFarland, A.C. Parker and R. Camposano, “The high-level synthesis of digital systems,”Proceedings of the IEEE, vol. 78, 1990, pp. 301–317.

    Article  Google Scholar 

  11. K.K. Parhi and D.G. Messesrschmitt, “Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding,”IEEE Transactions on Computers, vol. 40, 1991, pp. 178–195.

    Article  Google Scholar 

  12. D.A. Schwartz and T.P. Barnwell III, “Cyclo-static multiprocessor scheduling for the optimal realization of shift-invariant flow graphs,”Proc. ICASSP-85, 1985, pp. 1384–1387.

  13. M. Potkonjak and J. Rabaey, “Optimizing resource utilization using transformations,”Proc. of ICCAD-91, pp. 88–91.

  14. R. Hartley and A. Casavant, “Tree-height minimization in pipelined architectures,”Proc. of ICCAD-89, pp. 112–115.

  15. Y. Muraoka,Parallelism Exposure and Exploitation in Programs, Ph.D. Dissertation, Univ. of Illinois at Urbana-Champaign, Dept. of Comp. Science, 1971.

  16. S. Kung, “On supercomputing with systolic/wavefront array processors,”Proceedings of the IEEE, vol. 72, 1984, pp. 867–884.

    Article  Google Scholar 

  17. P.H. Winston,Artificial Intelligence, Reading, MA: Addison-Wesley, 1984.

    MATH  Google Scholar 

  18. Y. Miyanaga, Y. Yokoyama and K. Tochinai, “Automatic design system of parallel/pipelined VLSI architecture for adaptive signal processing,”Proceedings of the ISMM International Conference on Parallel and Distributed Computing, and Systems, 1990, pp. 24–28.

  19. M. Tatibana, Y. Miyanaga and K. Tochinai, “Automatic design system of parallel VLSI architectures using neural network,”Proceedings of the Fourth Japanese-Sino Sapporo International Conference on Computer Applications, pp. 187–190.

  20. H. Forren and D.A. Schwartz, “Transforming periodic synchronous multiprocessor programs,”Proc. ICASSP-S7, 1987, pp. 1406–1409.

  21. Y. Miyanaga, N. Nagai and K. Nagata, “Parallel procesing methods for parametric modeling of stochastic signals,”Transactions of IEICE, vol. J 70-A, pp. 1395–1405.

  22. K. Ito and H. Kunieda, “VLSI system compiler for digital signal processing: modularization and synchronization,”IEEE Transactions on Circuits and Systems, vol. 38, 1991, pp. 423–433.

    Article  Google Scholar 

  23. S.Y. Kung, H.J. Whitehouse and T. Kaliath,VLSI and Modern Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1985, pp. 258–264.

    Google Scholar 

  24. F.B. Maciel, Y. Miyanaga and K. Tochinai, “Two optimization techniques for high-level synthesis,” Proceedings of the 1991 Joint Convention, The Hokkaido Chapters of the IEICE, 1991, p. 314.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maciel, F.B., Miyanaga, Y. & Tochinai, K. An optimization technique for lowering the iteration bound of DSP programs. J VLSI Sign Process Syst Sign Image Video Technol 5, 273–282 (1993). https://doi.org/10.1007/BF01581301

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01581301

Keywords

Navigation