Skip to main content
Log in

Abstract

Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Aiken and A. Nicolau, “Optimal Loop Parallelization,” in ACM Conference on Programming Language Design and Implementation, 1988, pp. 308–317.

  2. A. Aiken and A. Nicolau, Fine-Grain Parallelization and the Wavefront Method, MIT Press, 1990.

  3. J. R. Allen and K. Kennedy, “Automatic Loop Interchange,” in ACMSIGPLAN symposium on Compiler construction, 1984, pp. 233–246.

  4. G. I. C. Amy, W. Lim, and M. S. Lam, “An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication,” in International Conference on Supercomputing, 1999, pp. 228–237.

  5. J. M. Anderson and M. S. Lam, “Global optimizations for parallelism and locality on scalable parallel machines,” in ACM SIGPLAN Conference on Programming Language Design and Implementations, June, 1993, pp. 112–125.

  6. U. Banerjee, Unimodular Transformations of Double Loops, MIT Press, 1991.

  7. K. Iwano and S. Yeh, “An efficient algorithm for optimal loop parallelization,” in Proc. of the First International Symposium of Algorithms, Dec., 1990, pp. 201–210.

  8. R. M. Karp, “A Characterization of the Minimum Cycle Mean in a Digraph,” Discrete Math., vol. 23, 1978, pp. 309–311.

    MATH  MathSciNet  Google Scholar 

  9. L. Lamport, “The Parallel Execution of do Loops,” Commun. ACM SIGPLAN, vol. 17, FEB. 1991, pp. 82–93.

    Google Scholar 

  10. C. E. Leiserson and J. B. Saxe, “Retiming Synchronous Circuitry,” Algorithmica, vol. 6, 1991, pp. 5–35.

    Article  MATH  MathSciNet  Google Scholar 

  11. A. W. Lim and M. S. Lam, “Maximizing Parallelism and Minimizing Synchronization with Affine Transforms,” in ACM SIGPLAN Symposium on Principles of Programming Languages, Jan., 1997, pp. 201–214.

  12. K. K. Parhi and D. G. Messerschmitt, “Static Rate-optimal Scheduling of Iterative Data-flow Programs via Optimum Unfolding,” IEEE Trans. Comput., vol. 40, 1991, pp. 178–195.

    Article  Google Scholar 

  13. N. L. Passos and E. H.-M. Sha, “Full Parallelism in Uniform Nested Loops using Multidimensional Retiming,” in International Conference on Parallel Processing, Aug., 1994, pp. 130–133.

  14. M. Wolfe, “Loop Skewing: The Wavefront Method Revisited,” Int. J. Parallel Program., vol. 15, no. 4, 1986, pp. 284–294.

    Article  Google Scholar 

  15. M. E. Wolf and M. S. Lam, “A Data Locality Optimizing Algorithm,” in ACM SIGPLAN conference on Programming Language Design and Implementation, June, vol., 2, 1991, pp. 30–44.

    Google Scholar 

  16. M. E. Wolf and M. S. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel Distrib. Syst., vol. 2, 1991, pp. 452–471.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Xue.

Additional information

This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 009741-0028-2001 and NSF CCR-0309461, USA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, C., Shao, Z. & Sha, E.HM. Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping. J VLSI Sign Process Syst Sign Image Video Technol 47, 153–167 (2007). https://doi.org/10.1007/s11265-006-0034-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-006-0034-5

Keywords

Navigation