Abstract
The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.
This work is partially supported by TI University Program, NSF EIA-0103709, Texas ARP 009741-0028-2001, and NSF CCR-0309461, NSF IIS-0513669, Microsoft, USA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aiken, A., Nicolau, A.: Optimal loop parallelization. In: ACM Conference on Programming Language Design and Implementation, pp. 308–317 (1988)
Aiken, A., Nicolau, A.: Fine-Grain Parallelization and the Wavefront Method. MIT Press, Cambridge (1990)
Allen, J.R., Kennedy, K.: Automatic loop interchange. In: ACM SIGPLAN symposium on Compiler construction, pp. 233–246 (1984)
Anderson, J.M., Lam, M.S.: Global optimizations for parallelism and locality on scalable parallel machines. In: ACM SIGPLAN Conference on Programming Language Design and Implementations, pp. 112–125 (June 1993)
Banerjee, U.: Unimodular Transformations of Double Loops. MIT Press, Cambridge (1991)
Iwano, K., Yeh, S.: An efficient algorithm for optimal loop parallelization (December 1990)
Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23, 309–311 (1978)
Lamport, L.: The parallel execution of do loops. Communications of the ACM SIGPLAN 17, 82–93 (1991)
Leiserson, C.E., Saxe, J.B.: Retiming synchronous circuitry. Algorithmica 6, 5–35 (1991)
Parhi, K.K., Messerschmitt, D.G.: Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 178–195 (1991)
Passos, N.L., Sha, E.H.-M.: Full parallelism in uniform nested loops using multi-dimensional retiming. In: International Conference on Parallel Processing, August 1994, pp. 130–133 (1994)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: ACM SIGPLAN conference on Programming Language Design and Implementation, June 1991, vol. 2, pp. 30–44 (1991)
Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 452–471 (1991)
Wolfe, M.: Loop skewing: the wavefront method revisited. International Journal of Parallel Programming 15(4), 284–294 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xue, C., Shao, Z., Liu, M., Qiu, M., Sha, E.H.M. (2006). Loop Striping: Maximize Parallelism for Nested Loops. In: Sha, E., Han, SK., Xu, CZ., Kim, MH., Yang, L.T., Xiao, B. (eds) Embedded and Ubiquitous Computing. EUC 2006. Lecture Notes in Computer Science, vol 4096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11802167_42
Download citation
DOI: https://doi.org/10.1007/11802167_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36679-9
Online ISBN: 978-3-540-36681-2
eBook Packages: Computer ScienceComputer Science (R0)