Loop Transformation Recipes for Code Generation and Auto-Tuning

Hall, Mary; Chame, Jacqueline; Chen, Chun; Shin, Jaewook; Rudy, Gabe; Khan, Malik Murtaza

doi:10.1007/978-3-642-13374-9_4

Mary Hall¹⁸,
Jacqueline Chame¹⁹,
Chun Chen¹⁸,
Jaewook Shin²⁰,
Gabe Rudy¹⁸ &
…
Malik Murtaza Khan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

1057 Accesses
45 Citations

Abstract

In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation and automatically selects the best-performing implementation. Along with the original computation, a transformation recipe specifies a range of implementations of the computation resulting from composing a set of high-level code transformations. In our system, an underlying polyhedral framework coupled with transformation algorithms takes this set of transformations, composes them and automatically generates correct code. We first describe an abstract interface for transformation recipes, which we propose to facilitate interoperability with other transformation frameworks. We then focus on the specific transformation recipe interface used in our compiler and present performance results on its application to kernel and library tuning and tuning of key computations in high-end applications. We also show how this framework can be used to generate and auto-tune parallel OpenMP or CUDA code from a high-level specification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.peri-scidac.org/wiki/index.php/Main_Page
http://rosecompiler.org/
http://www.gnu.org/prep/standards/html_node/Errors.html
http://nek5000.mcs.anl.gov/index.php/Main_Page
Ahmed, N., Mateev, N., Pingali, K.: Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In: Proceedings of the 2000 ACM International Conference on Supercomputing (May 2000)
Google Scholar
Almagor, L., Cooper, K.D., Grosul, A., Harvey, T.J., Reeves, S.W., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems, LCTES 2004 (June 2004)
Google Scholar
Anderson, E., Sorensen, D., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Croz, J.D., Hammarling, S., Demmel, J., Bischof, C.H.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of Supercomputing 1990 (November 1990)
Google Scholar
Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 16(6), 1768–1810 (1994)
Article Google Scholar
Chen, C.: Model-Guided Empirical Optimization for Memory Hierarchy. PhD thesis, University of Southern California (May 2007)
Google Scholar
Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California (June 2008)
Google Scholar
Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: Proceedings of the International Symposium on Code Generation and Optimization (March 2005)
Google Scholar
Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. The Journal of Supercomputing 23(1), 7–22 (2002)
Article MATH Google Scholar
Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D., Pingali, K.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006)
Chapter Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation 93(2), 216–231 (2005)
Google Scholar
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34(3), 261–317 (2006)
Article MATH Google Scholar
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: Proceedings of the 23rd International Parallel and Distributed Processing Symposium (May 2009)
Google Scholar
Herrero, J.R., Navarro, J.J.: Improving performance of hypermatrix cholesky factorization. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 461–469. Springer, Heidelberg (2003)
Google Scholar
Jiménez, M., Llabería, J.M., Fernández, A.: Register tiling in nonrectangular iteration spaces. ACM Transactions on Programming Languages and Systems 24(4), 409–453 (2002)
Article Google Scholar
Kaushik, D.K., Gropp, W., Minkoff, M., Smith, B.: Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 120–130. Springer, Heidelberg (2008)
Chapter Google Scholar
Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical Report CS-TR-3193, Department of Computer Science, University of Maryland (1993)
Google Scholar
Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2000)
Google Scholar
Knijnenburg, P.M.W., Kisuki, T., Gallivan, K., O’Boyle, M.F.P.: The effect of cache models on iterative compilation for combined tiling and unrolling. Concurrency and Computation: Practice and Experience 16(2-3), 247–270 (2004)
Article Google Scholar
Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1997)
Google Scholar
Lee, Y., Diniz, P., Hall, M., Lucas, R.: Empirical optimization for a sparse linear solver: A case study. International Journal of Parallel Programming 33 (2005)
Google Scholar
Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine partitioning. In: Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1997) (January 1997)
Google Scholar
Lim, A.W., Liao, S.-W., Lam, M.S.: Blocking and array contraction across arbitrarily nested loops using affine partitioning. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (June 2001)
Google Scholar
Lu, Q., Krishnamoorthy, S., Sadaypppan, P.: Combining analytical and empirical approaches in tuning matrix transposition. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (September 2006)
Google Scholar
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)
Article Google Scholar
Norris, B., Hartono, A., Jessup, E., Siek, J.: Generating empirically optimized composed matrix kernels from matlab prototypes. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science – ICCS 2009. LNCS, vol. 5544, pp. 248–258. Springer, Heidelberg (2009)
Chapter Google Scholar
Pop, S., Cohen, A., Bastoul, C., Girbal, S., Silber, G.-A., Vasilache, N.: GRAPHITE: Polyhedral analyses and optimizations for GCC. In: Proceedings of the 4th GCC Developers’ Summit (June 2006)
Google Scholar
Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: Part I, one-dimensional time. In: Proceedings of the International Symposium on Code Generation and Optimization (March 2007)
Google Scholar
Pouchet, L.-N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part II, multi-dimensional time. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2008)
Google Scholar
Pugh, B., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, p. 164. Springer, Heidelberg (1999)
Chapter Google Scholar
Qasem, A., Jin, G., Mellor-Crummey, J.: Improving performance with integrated program transformations. Technical Report TR03-419, Rice University (October 2003)
Google Scholar
Qasem, A., Kennedy, K.: Profitable loop fusion and tiling using model-driven empirical search. In: Proceedings of the 2006 ACM International Conference on Supercomputing (June 2006)
Google Scholar
Ren, M., Park, J.Y., Houston, M., Aiken, A., Dally, W.J.: A tuning framework for software-managed memory hierarchies. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2008)
Google Scholar
Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1998)
Google Scholar
Sarkar, V., Thekkath, R.: A general framework for iteration-reordering loop transformations. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1992)
Google Scholar
Shin, J., Hall, M.W., Chame, J., Chen, C., Hovland, P.D.: Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology. In: The Fourth International Workshop on Automatic Performance Tuning (October 2009)
Google Scholar
Temam, O., Granston, E.D., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing 1993 (November 1993)
Google Scholar
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 24th International Parallel and Distributed Processing Symposium (April 2009)
Google Scholar
Tufo, H.M., Fischer, P.F.: Terascale spectral element algorithms and implementations. In: ACM/IEEE conference on Supercomputing, Portland, OR (1999)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–35 (2001)
Article MATH Google Scholar
Clint Whaley, R., Whaley, D.B.: Tuning high performance kernels through empirical compilation. In: Proceedings of the 34th International Conference on Parallel Processing (June 2005)
Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1991)
Google Scholar
Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452–471 (1991)
Article Google Scholar
Wolfe, M.: Data dependence and program restructuring. The Journal of Supercomputing 4(4), 321–344 (1991)
Article Google Scholar
Wolfe, M.: Compilers and more: Optimizing gpu kernels (October 2008), http://www.hpcwire.com/features/Compilers_and_More_Optimizing_GPU_Kernels.html
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: parameterized optimizations for empirical tuning. In: Proceedings of the 21st International Parallel and Distributed Processing Symposium (March 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Utah, Salt Lake City, UT
Mary Hall, Chun Chen & Gabe Rudy
USC/Information Sciences Institute, Marina del Rey, CA
Jacqueline Chame & Malik Murtaza Khan
Argonne National Laboratory, Argonne, IL
Jaewook Shin

Authors

Mary Hall
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline Chame
View author publications
You can also search for this author in PubMed Google Scholar
Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jaewook Shin
View author publications
You can also search for this author in PubMed Google Scholar
Gabe Rudy
View author publications
You can also search for this author in PubMed Google Scholar
Malik Murtaza Khan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hall, M., Chame, J., Chen, C., Shin, J., Rudy, G., Khan, M.M. (2010). Loop Transformation Recipes for Code Generation and Auto-Tuning. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics