research-article

Improving compiler scalability: optimizing large programs at small price

Authors:
Sanyam Mehta

University of Minnesota, USA

University of Minnesota, USA
View Profile

,
Pen-Chung Yew

University of Minnesota, USA

University of Minnesota, USA
View Profile

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2015Pages 143–152https://doi.org/10.1145/2737924.2737954

Published:03 June 2015Publication History

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 143–152

ABSTRACT

Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful yet costly integer programming algorithms to compose loop optimizations. As a result, the benefits that a polyhedral compiler has to offer to programs such as real scientific applications that contain sequences of loop nests, remain impractical for the common users. In this work, we address this scalability problem in polyhedral compilers. We identify three causes of unscalability, each of which stems from large number of statements and dependences in the program scope. We propose a one-shot solution to the problem by reducing the effective number of statements and dependences as seen by the compiler. We achieve this by representing a sequence of statements in a program by a single super-statement. This set of super-statements exposes the minimum sufficient constraints to the Integer Linear Programming (ILP) solver for finding correct optimizations. We implement our approach in the PLuTo polyhedral compiler and find that it condenses the program statements and program dependences by factors of 4.7x and 6.4x, respectively, averaged over 9 hot regions (ranging from 48 to 121 statements) in 5 real applications. As a result, the improvements in time and memory requirement for compilation are 268x and 20x, respectively, over the latest version of the PLuTo compiler. The final compile times are comparable to the Intel compiler while the performance is 1.92x better on average due to the latter’s conservative approach to loop optimization.

References

R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9:491–542, 1987. Google ScholarDigital Library
U. K. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Norwell, MA, USA, 1988. Google ScholarDigital Library
S. G. Bhaskaracharya and U. Bondhugula. Polyglot: a polyhedral loop transformation framework for a graphical dataflow language. In Compiler Construction, pages 123–143. Springer, 2013. Google ScholarDigital Library
U. Bondhugula. Pluto: An automatic parallelizer and locality optimizer for multicores, 2014. Available at http:// pluto-compiler.sourceforge.net.Google Scholar
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 101–113, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
U. Bondhugula, V. Bandishti, A. Cohen, G. Potron, and N. Vasilache. Tiling and optimizing time-iterated computations on periodic domains. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pages 39–50, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
G. B. Dantzig and B. Curtis Eaves. Fourier-motzkin elimination and its dual. Journal of Combinatorial Theory, Series A, 14(3):288–297, 1973.Google ScholarCross Ref
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. JPDC, 64(1):108 – 134, 2004. Google ScholarDigital Library
P. Feautrier. Parametric integer programming. RAIRO Recherche Op’erationnelle, 22, 1988.Google Scholar
P. Feautrier. Scalable and structured scheduling. International Journal of Parallel Programming, 34(5):459–487, 2006. Google ScholarDigital Library
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319–349, 1987. Google ScholarDigital Library
G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 15–29, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
T. Grosser, A. Groesslinger, and C. Lengauer. Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04), 2012.Google ScholarCross Ref
N. P. Johnson, T. Oh, A. Zaks, and D. I. August. Fast condensation of the program dependence graph. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 39–50, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC, volume 768 of Lecture Notes in Computer Science, pages 301–320. 1994. Google ScholarDigital Library
L. Lamport. The parallel execution of do loops. Commun. ACM, 17 (2):83–93, Feb. 1974. Google ScholarDigital Library
D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 1–14, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In SPAA, pages 282–291. ACM, 1997. Google ScholarDigital Library
S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 233–246, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
S. P. Midkiff. Automatic parallelization: An overview of fundamental compiler techniques. Synthesis Lectures on Computer Architecture, 7 (1):1–169, 2012. Google ScholarDigital Library
J. Ng, D. Kulkarni, W. Li, R. Cox, and S. Bobholz. Inter-procedural loop fusion, array contraction and rotation. In Parallel Architectures and Compilation Techniques, 2003. PACT 2003. Proceedings. 12th International Conference on, pages 114–124, 2003. Google ScholarDigital Library
L.-N. Pouchet and M. Narayan. Polyopt: a polyhedral optimizer for the rose compiler, 2014. Available at http://www.cse. ohio-state.edu/˜pouchet/software/polyopt/.Google Scholar
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedralbased data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 4–13. ACM, 1991. Google ScholarDigital Library
A. J. Thadhani. Factors affecting programmer productivity during application development. IBM Systems Journal, 23(1):19–35, 1984. Google ScholarDigital Library
R. Upadrasta and A. Cohen. Sub-polyhedral scheduling using (unit- )two-variable-per-inequality polyhedra. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, pages 483–496, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
N. Vasilache, C. Bastoul, A. Cohen, and S. Girbal. Violated dependence analysis. In Proceedings of the 20th annual international conference on Supercomputing, pages 335–344. ACM, 2006. Google ScholarDigital Library
A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pages 185:185–185:194, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 299–302. Springer Berlin Heidelberg, 2010. Google ScholarDigital Library
S. Verdoolaege. Integer set coalescing. In 5th International Workshop on Polyhedral Compilation Techniques (IMPACT), 2015.Google Scholar
M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. International Journal of Parallel Programming, 16(2):137–178, 1987. Google ScholarDigital Library
M. Wolfe and C.-W. Tseng. The power test for data dependence. Parallel and Distributed Systems, IEEE Transactions on, 3(5):591– 601, 1992. Google ScholarDigital Library
M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. Google ScholarDigital Library

Index Terms

Improving compiler scalability: optimizing large programs at small price
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. ...
Read More
Improving compiler scalability: optimizing large programs at small price
PLDI '15

Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful ...
Read More
A polyhedral compilation framework for loops with dynamic data-dependent bounds
CC 2018: Proceedings of the 27th International Conference on Compiler Construction

We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
General Chair:
David Grove
IBM Research, USA
,
Program Chair:
Steve Blackburn
Australian National University, Australia
ACM SIGPLAN Notices Volume 50, Issue 6
PLDI '15
June 2015
630 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2813885
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler scalability
O-molecule
optimization
polyhedral model
statement condensation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 540
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving compiler scalability: optimizing large programs at small price

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Improving compiler scalability: optimizing large programs at small price

A polyhedral compilation framework for loops with dynamic data-dependent bounds