Abstract
Efficient use of multiple pipelined functional units and registers is very important for achieving high performance on modern processors. Instruction Level Parallelism (ILP) and register reuse (through register tiling) are two mechanisms for this, respectively. Program transformations that expose and exploit ILP and register reuse interact with each other in subtle ways. We study the combined problem of optimal ILP and register reuse. We consider the class of uniform dependence, fully permutable, rectangular loop nests. We develop an analytical model of the combined problem and formulate a mathematical optimization problem that chooses the parameters of the ILP-exposing transformation and register tiling so as to minimize the total execution time. We distinguish two cases: when loop permutation can and cannot expose a parallel loop. We show that the combined problem can be reduced to a single integer convex optimization problem for the former case, and to a set of integer convex optimization problems for the latter case, both of which can be solved to global optimality.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence Based Approach. Morgan Kaufman, San Francisco (2002)
Carr, S., Sweany, P.: An experimental evaluation of scalar replacement on scientific benchmarks. Software Practice and Experience 33(15), 1419–1445 (2003)
Lam, M.: Software pipelining: an effective scheduling technique for vliw machines. In: PLDI 1988: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pp. 318–328. ACM Press, New York (1988)
Rau, B.R.: Iterative modulo scheduling: an algorithm for software pipelining loops. In: MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pp. 63–74. ACM Press, New York (1994)
Allan, V.H., Jones, R.B., Lee, R.M., Allan, S.J.: Software pipelining. ACM Comput. Surv. 27(3), 367–432 (1995)
Darte, A., Robert, Y., Vivien, F.: Scheduling and Automatic Parallelization. Birkhäuser, Boston (2000)
Xue, J.: Loop tiling for parallelism. Kluwer Academic Publishers, Dordrecht (2000)
Renganarayana, L., Ramakrishna, U., Rajopadhye, S.: Combined ILP and register tiling: Analytical model and optimization framework. Technical Report CS-05-102, Department of Computer Science, Colorado State University (2005), Available from: http://www.cs.colostate.edu/~ln/publications/TR-CS-05-102.pdf
Rong, H., Douillet, A., Gao, G.R.: Register allocation for software pipelined multidimensional loops. In: PLDI 2005: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pp. 154–167. ACM Press, New York (2005)
Ramanujam, J.: Optimal software pipelining of nested loops. In: IPPS, pp. 335–342 (1994)
Xue, J.: On tiling as a loop transformation. Parallel Processing Letters 7(4), 409–424 (1997)
Renganarayana, L., Rajopadhye, S.: A geometric programming framework for optimal multi-level tiling. In: SC 2004: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, vol. 18. IEEE Computer Society, Los Alamitos (2004)
Andonov, R., Balev, S., Rajopadhye, S.V., Yanev, N.: Optimal semi-oblique tiling. IEEE Trans. Parallel Distrib. Syst. 14(9), 944–960 (2003)
Sarkar, V.: Optimized unrolling of nested loops. International Journal of Parallel Programming 29(5), 545–581 (2001)
Wolf, M.E., Maydan, D.E., Chen, D.K.: Combining loop transformations considering caches and scheduling. In: Proceedings of the 29th Annual International Symposium on Microarchitecture, Paris, IEEE Computer Society TC-MICRO and ACM SIGMICRO, pp. 274–286 (1996)
Duffin, R., Peterson, E., Zener, C.: Geometric Programming – Theory and Applications. John Wiley, Chichester (1967)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004), Online version available at: http://www.stanford.edu/~boyd/cvxbook.html
Kortanek, K.O., Xu, X., Ye, Y.: An infeasible interior-point algorithm for solving primal and dual geometric programs. Math. Program. 76(1), 155–181 (1997)
Löfberg, J.: YALMIP: A toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference, Taipei, Taiwan (2004), Available from, http://control.ee.ethz.ch/~joloef/yalmip.php
Callahan, D., Carr, S., Kennedy, K.: Improving register allocation for subscripted variables. In: PLDI 1990: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, pp. 53–65. ACM Press, New York (1990)
Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16(6), 1768–1810 (1994)
Carter, L., Ferrante, J., Hummel, S.F.: Hierarchical tiling for improved superscalar performance. In: Proceedings of the 9th International Symposium on Parallel Processing, Washington, DC, USA, pp. 239–245. IEEE Computer Society, Los Alamitos (1995)
Mitchell, N., Högstedt, K., Carter, L., Ferrante, J.: Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming 26(6), 641–670 (1998)
Jiménez, M., Llabería, J.M., Fernández, A.: Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst. 24(4), 409–453 (2002)
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multi-dimensional loops. In: CGO 2004: Proceedings of the international symposium on Code generation and optimization, Washington, DC, USA. IEEE Computer Society, Los Alamitos (2004)
Rong, H., Douillet, A., Govindarajan, R., Gao, G.R.: Code generation for singledimension software pipelining of multi-dimensional loops. In: CGO 2004: Proceedings of the international symposium on Code generation and optimization, Washington, DC, USA. IEEE Computer Society, Los Alamitos (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Renganarayana, L., Ramakrishna, U., Rajopadhye, S. (2006). Combined ILP and Register Tiling: Analytical Model and Optimization Framework. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-69330-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)