research-article

Automatic creation of tile size selection models

Authors:
Tomofumi Yuki

Colorado State University, Fort Collins, CO, USA

Colorado State University, Fort Collins, CO, USA
View Profile

,
Lakshminarayanan Renganarayanan

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Sanjay Rajopadhye

Colorado State University, Fort Collins, CO, USA

Colorado State University, Fort Collins, CO, USA
View Profile

,
Charles Anderson

Colorado State University, Fort Collins, CO, USA

Colorado State University, Fort Collins, CO, USA
View Profile

,
Alexandre E. Eichenberger

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Kevin O'Brien

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimizationApril 2010Pages 190–199https://doi.org/10.1145/1772954.1772982

Published:24 April 2010Publication History

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Pages 190–199

ABSTRACT

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.

References

Intel 64 and IA-32 Architectures Optimization Reference Manual.Google Scholar
C.M. Bishop et al. Pattern recognition and machine learning.Springer New York:, 2006. Google ScholarDigital Library
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan.A practical automatic polyhedral program optimization system.In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2008.Google ScholarDigital Library
Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zoren. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19(1):188--222, January 1997. Google ScholarDigital Library
J. Cavazos and J.E.B. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 183--194,2004. Google ScholarDigital Library
J. Cavazos and M.F.P. O'Boyle. Method-specific dynamic compilation using logistic regression. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming languages,systems, and applications, pages 229--240, 2006. Google ScholarDigital Library
Jacqueline Chame and Sungdo Moon. A tile selection algorithm for data locality and cache interference. In 1999 ACM International Conference on Supercomputing, pages 492--499. ACM Press, 1999. Google ScholarDigital Library
Chun Chen, Jacqueline Chame, and Mary Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 111--122,Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
S. Coleman and K.S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation,pages 279--290. ACM New York, NY, USA, 1995. Google ScholarDigital Library
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C.Whaley, and K. Yelick. Self-Adapting Linear Algebra Algorithms and Software. In Proceedings of the IEEE, 93(2):293,2005.Google Scholar
Arkady Epshteyn, María Jesús Garzarán, Gerald DeJong, David A.Padua, Gang Ren, Xiaoming Li, Kamen Yotov, and Keshav Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, pages 259--273,2005. Google ScholarDigital Library
K. Esseghir. Improving data locality for caches. Master's thesis, Rice University, 1993.Google Scholar
Basilio B. Fraguela, M. G. Carmueja, and Diego Andrade. Optimal tile size selection guided by analytical models. In PARCO, pages 565--572, 2005.Google Scholar
A. Hartono, M.M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rdinternational conference on Conference on Supercomputing, pages 147--157. ACM New York, NY, USA, 2009. Google ScholarDigital Library
Chung-Hsing Hsu and Ulrich Kremer. A quantitative analysis of tile size selection algorithms. J. Supercomput., 27(3):279--294, 2004. Google ScholarDigital Library
F. Irigoin and R. Triolet. Super node partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, Jan 1988. Google ScholarDigital Library
Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, and Katherine Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance, pages 36--43, New York,NY, USA, 2005. ACM Press. Google ScholarDigital Library
DaeGon Kim and Sanjay Rajopadhye. Efficient tiled loop generation:D-tiling. In The 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009. Google ScholarDigital Library
T. Kisuki, P.M.W. Knijnenburg, and MFP O' Boyle. Combined selection of tile sizes and unroll factors using iterative compilation.In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237. Citeseer, 2000. Google ScholarDigital Library
P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle.The effect of cache models on iterative compilation for combined tiling and unrolling. Concurr. Comput.: Pract. Exper., 16(2-3):247--270, 2004. Google ScholarDigital Library
M.D. Lam, E.E. Rothberg, and M.E. Wolf. The cache performance and optimizations of blocked algorithms. Proceedings of the 4thinternational conference on architectural support for programming languages and operating systems, 25:63--74, 1991. Google ScholarDigital Library
Monica S. Lam and Michael E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459,1991.Google Scholar
Xiaoming Li and María Jesús Garzaran. Optimizing matrix multiplication with a classifier learning system. In Workshop on Languages and Compilers for Parallel Computing, pages 121--135,2005. Google ScholarDigital Library
A. McGovern, E. Moss, and A. Barto. Scheduling straight-line code using reinforcement learning and rollouts. (UM-CS-1999-023), ,1999. Google ScholarDigital Library
N. Mitchell, N. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, 1998. Google ScholarDigital Library
Martin F. Møller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6:525--533, 1993. Google ScholarDigital Library
A.Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. Lecture notes in computer science, pages 41--50, 2002.Google Scholar
Eliot Moss, Paul Utgoff, John Cavazos, Doina Precup, Darko Stefanovic, Carla Brodley, and David Scheeff. Learning to schedule straight-line code. In Proceedings of Neural Information Processing Symposium, pages 929--935. MIT Press, 1997. Google ScholarDigital Library
Saeed Parsa and Shahriar Lotfi. A new genetic algorithm for loop tiling. The Journal of Supercomputing, 37(3):249--269, 2006. Google ScholarDigital Library
Apan Qasem and Ken Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 249--258, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Lakshminarayanan Renganarayana and Sanjay Rajopadhye. Positivity, posynomials and tile size selection. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12,Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarDigital Library
Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye,and Michelle Mills Strout. Parameterized tiled loops for free.In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 405--414,New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Gabriel Rivera and Chau wen Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99, pages 168--182, 1999. Google ScholarDigital Library
V. Sarkar, N. Megiddo, I.B.M.T.J.W.R. Center, and Y. Heights. An analytical model for loop tiling and its solution. Performance Analysis of Systems and Software, 2000. ISPASS. 2000 IEEE International Symposium on, pages 146--153, 2000. Google ScholarDigital Library
R. Schreiber and J. Dongarra. Automatic blocking of nested loops.Technical Report 90.38, RIACS, NASA Ames Research Center, Aug1990.Google ScholarDigital Library
M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005. Google ScholarDigital Library
Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN '03Conference on Programming Language Design and Implementation,pages 77--90. ACM Press, 2002. Google ScholarDigital Library
Xavier Vera, Jaume Abella, Antonio González, and Josep Llosa.Optimizing program locality through cmes and gas. In PACT'03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 68, Washington,DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society,1998. Google ScholarDigital Library
R. Clint Whaley and Antoine Petitet. Minimizing development and maintenance costs in supporting persistently optimized BLAS.Software: Practice and Experience, 35(2):101--121, February 2005. Google ScholarDigital Library
Jingling Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000. Google ScholarDigital Library
K. Yotov, Xiaoming Li, Gang Ren, M. J. S. Garzaran, D. Padua,K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? In Proceedings of the IEEE, 93:358--386,2005.Google ScholarCross Ref
Kamen Yotov, Keshav Pingali, and Paul Stodghill. Think globally,search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, NewYork, NY, USA, 2005. ACM. Google ScholarDigital Library

Index Terms

Automatic creation of tile size selection models
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A practical tile size selection model for affine loop nests
ICS '21: Proceedings of the ACM International Conference on Supercomputing

Loop tiling for locality is an important transformation for general-purpose and domain-specific compilation as it allows programs to exploit the benefits of deep memory hierarchies. Most code generation tools with the infrastructure to perform automatic ...
Read More
Tile size selection revisited

Loop tiling is a widely used loop transformation to enhance data locality and allow data reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in performance. Thus, selection of an optimal tile size is critical to ...
Read More
Optimal Tile Size Selection Problem Using Machine Learning
ICMLA '12: Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 02

One of the key feature of modern architectures is deep memory hierarchies. In order to exploit this feature, one has to expose data locality with-in a program. Loop tiling is an optimization phase in modern compilers which is used to transform a loop ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
April 2010
300 pages
ISBN:9781605586359
DOI:10.1145/1772954
General Chairs:
Andreas Moshovos
University of Toronto
,
Greg Steffan
University of Toronto
,
Program Chairs:
Kim Hazelwood
University of Virginia
,
David Kaeli
Northeastern University
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
machine learning
neural network
performance modeling
tiling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,061submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 527
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic creation of tile size selection models

CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

A practical tile size selection model for affine loop nests

Tile size selection revisited

Optimal Tile Size Selection Problem Using Machine Learning