skip to main content
10.1145/1772954.1772982acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Automatic creation of tile size selection models

Published:24 April 2010Publication History

ABSTRACT

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.

References

  1. Intel 64 and IA-32 Architectures Optimization Reference Manual.Google ScholarGoogle Scholar
  2. C.M. Bishop et al. Pattern recognition and machine learning.Springer New York:, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan.A practical automatic polyhedral program optimization system.In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zoren. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19(1):188--222, January 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cavazos and J.E.B. Moss. Inducing heuristics to decide whether to schedule. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 183--194,2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cavazos and M.F.P. O'Boyle. Method-specific dynamic compilation using logistic regression. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming languages,systems, and applications, pages 229--240, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jacqueline Chame and Sungdo Moon. A tile selection algorithm for data locality and cache interference. In 1999 ACM International Conference on Supercomputing, pages 492--499. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chun Chen, Jacqueline Chame, and Mary Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 111--122,Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Coleman and K.S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation,pages 279--290. ACM New York, NY, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C.Whaley, and K. Yelick. Self-Adapting Linear Algebra Algorithms and Software. In Proceedings of the IEEE, 93(2):293,2005.Google ScholarGoogle Scholar
  11. Arkady Epshteyn, María Jesús Garzarán, Gerald DeJong, David A.Padua, Gang Ren, Xiaoming Li, Kamen Yotov, and Keshav Pingali. Analytic models and empirical search: A hybrid approach to code optimization. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, pages 259--273,2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Esseghir. Improving data locality for caches. Master's thesis, Rice University, 1993.Google ScholarGoogle Scholar
  13. Basilio B. Fraguela, M. G. Carmueja, and Diego Andrade. Optimal tile size selection guided by analytical models. In PARCO, pages 565--572, 2005.Google ScholarGoogle Scholar
  14. A. Hartono, M.M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy,B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rdinternational conference on Conference on Supercomputing, pages 147--157. ACM New York, NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chung-Hsing Hsu and Ulrich Kremer. A quantitative analysis of tile size selection algorithms. J. Supercomput., 27(3):279--294, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Irigoin and R. Triolet. Super node partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, Jan 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, and Katherine Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance, pages 36--43, New York,NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. DaeGon Kim and Sanjay Rajopadhye. Efficient tiled loop generation:D-tiling. In The 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Kisuki, P.M.W. Knijnenburg, and MFP O' Boyle. Combined selection of tile sizes and unroll factors using iterative compilation.In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237. Citeseer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle.The effect of cache models on iterative compilation for combined tiling and unrolling. Concurr. Comput.: Pract. Exper., 16(2-3):247--270, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M.D. Lam, E.E. Rothberg, and M.E. Wolf. The cache performance and optimizations of blocked algorithms. Proceedings of the 4thinternational conference on architectural support for programming languages and operating systems, 25:63--74, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Monica S. Lam and Michael E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459,1991.Google ScholarGoogle Scholar
  23. Xiaoming Li and María Jesús Garzaran. Optimizing matrix multiplication with a classifier learning system. In Workshop on Languages and Compilers for Parallel Computing, pages 121--135,2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. McGovern, E. Moss, and A. Barto. Scheduling straight-line code using reinforcement learning and rollouts. (UM-CS-1999-023), ,1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Mitchell, N. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin F. Møller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6:525--533, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A.Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. Lecture notes in computer science, pages 41--50, 2002.Google ScholarGoogle Scholar
  28. Eliot Moss, Paul Utgoff, John Cavazos, Doina Precup, Darko Stefanovic, Carla Brodley, and David Scheeff. Learning to schedule straight-line code. In Proceedings of Neural Information Processing Symposium, pages 929--935. MIT Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Saeed Parsa and Shahriar Lotfi. A new genetic algorithm for loop tiling. The Journal of Supercomputing, 37(3):249--269, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Apan Qasem and Ken Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 249--258, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lakshminarayanan Renganarayana and Sanjay Rajopadhye. Positivity, posynomials and tile size selection. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12,Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye,and Michelle Mills Strout. Parameterized tiled loops for free.In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 405--414,New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Gabriel Rivera and Chau wen Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99, pages 168--182, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Sarkar, N. Megiddo, I.B.M.T.J.W.R. Center, and Y. Heights. An analytical model for loop tiling and its solution. Performance Analysis of Systems and Software, 2000. ISPASS. 2000 IEEE International Symposium on, pages 146--153, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Schreiber and J. Dongarra. Automatic blocking of nested loops.Technical Report 90.38, RIACS, NASA Ames Research Center, Aug1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In Proceedings of International Symposium on Code Generation and Optimization (CGO), pages 123--134, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O'Reilly. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN '03Conference on Programming Language Design and Implementation,pages 77--90. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xavier Vera, Jaume Abella, Antonio González, and Josep Llosa.Optimizing program locality through cmes and gas. In PACT'03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 68, Washington,DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27. IEEE Computer Society,1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Clint Whaley and Antoine Petitet. Minimizing development and maintenance costs in supporting persistently optimized BLAS.Software: Practice and Experience, 35(2):101--121, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jingling Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Yotov, Xiaoming Li, Gang Ren, M. J. S. Garzaran, D. Padua,K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? In Proceedings of the IEEE, 93:358--386,2005.Google ScholarGoogle ScholarCross RefCross Ref
  43. Kamen Yotov, Keshav Pingali, and Paul Stodghill. Think globally,search locally. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 141--150, NewYork, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic creation of tile size selection models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
      April 2010
      300 pages
      ISBN:9781605586359
      DOI:10.1145/1772954

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate312of1,061submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader