Skip to main content

Loop Striping: Maximize Parallelism for Nested Loops

  • Conference paper
Embedded and Ubiquitous Computing (EUC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4096))

Included in the following conference series:

  • 710 Accesses

Abstract

The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.

This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 009741-0028-2001, and NSF CCR-0309461, NSF IIS-0513669, Microsoft, USA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aiken, A., Nicolau, A.: Optimal loop parallelization. In: ACM Conference on Programming Language Design and Implementation, pp. 308–317 (1988)

    Google Scholar 

  2. Aiken, A., Nicolau, A.: Fine-Grain Parallelization and the Wavefront Method. MIT Press, Cambridge (1990)

    Google Scholar 

  3. Allen, J.R., Kennedy, K.: Automatic loop interchange. In: ACM SIGPLAN symposium on Compiler construction, pp. 233–246 (1984)

    Google Scholar 

  4. Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: ACM SIGPLAN Conference on Programming Language Design and Implementations, pp. 112–125 (June 1993)

    Google Scholar 

  5. Banerjee, U.: Unimodular Transformations of Double Loops. MIT Press, Cambridge (1991)

    Google Scholar 

  6. Iwano, K., Yeh, S.: An efficient algorithm for optimal loop parallelization (December 1990)

    Google Scholar 

  7. Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23, 309–311 (1978)

    MATH  MathSciNet  Google Scholar 

  8. Lamport, L.: The parallel execution of do loops. Communications of the ACM SIGPLAN 17, 82–93 (1991)

    Google Scholar 

  9. Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  10. Parhi, K.K., Messerschmitt, D.G.: Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 178–195 (1991)

    Article  Google Scholar 

  11. Passos, N.L., Sha, E.H.-M.: Full parallelism in uniform nested loops using multi-dimensional retiming. In: International Conference on Parallel Processing, August 1994, pp. 130–133 (1994)

    Google Scholar 

  12. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: ACM SIGPLAN conference on Programming Language Design and Implementation, June 1991, vol. 2, pp. 30–44 (1991)

    Google Scholar 

  13. Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 452–471 (1991)

    Article  Google Scholar 

  14. Wolfe, M.: Loop skewing: the wavefront method revisited. International Journal of Parallel Programming 15(4), 284–294 (1986)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xue, C., Shao, Z., Liu, M., Qiu, M., Sha, E.H.M. (2006). Loop Striping: Maximize Parallelism for Nested Loops. In: Sha, E., Han, SK., Xu, CZ., Kim, MH., Yang, L.T., Xiao, B. (eds) Embedded and Ubiquitous Computing. EUC 2006. Lecture Notes in Computer Science, vol 4096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11802167_42

Download citation

  • DOI: https://doi.org/10.1007/11802167_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36679-9

  • Online ISBN: 978-3-540-36681-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics