skip to main content
10.1145/1088149.1088169acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Facilitating the search for compositions of program transformations

Published: 20 June 2005 Publication History

Abstract

Static compiler optimizations can hardly cope with the complex run-time behavior and hardware components interplay of modern processor architectures. Multiple architectural phenomena occur and interact simultaneously, which requires the optimizer to combine multiple program transformations. Whether these transformations are selected through static analysis and models, runtime feedback, or both, the underlying infrastructure must have the ability to perform long and complex compositions of program transformations in a flexible manner. Existing compilers are ill-equipped to perform that task because of rigid phase ordering, fragile selection rules using pattern matching, and cumbersome expression of loop transformations on syntax trees. Moreover, iterative optimization emerges as a pragmatic and general means to select an optimization strategy via machine learning and operations research. Searching for the composition of dozens of complex, dependent, parameterized transformations is a challenge for iterative approaches.The purpose of this article is threefold: (1) to facilitate the automatic search for compositions of program transformations, introducing a richer framework which improves on classical polyhedral representations, suitable for iterative optimization on a simpler, structured search space, (2) to illustrate, using several examples, that syntactic code representations close to the operational semantics hamper the composition of transformations, and (3) that complex compositions of transformations can be necessary to achieve significant performance benefits. The proposed framework relies on a unified polyhedral representation of loops and statements. The key is to clearly separate four types of actions associated with program transformations: iteration domain, schedule, data layout and memory access functions modifications. The framework is implemented within the Open64/ORC compiler, aiming for native IA64, AMD64 and IA32 code generation, along with source-to-source optimization of Fortran90, C and C++.

References

[1]
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. in ACM Supercomputing'00, May 2000.]]
[2]
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loop. In ACM Symp. on Principles and Practice of Parallel Programming (PPoPP'91), pages 39--50, June 1991.]]
[3]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In Parallel Architectures and Compilation Techniques (PACT'04), Sept. 2004.]]
[4]
C. Bastoul and P. Feautrier. Improving data locality by chunking. In CC'12 Intl. Conference on Compiler Construction, LNCS 2622, pages 320--335, Warsaw, Poland, april 2003.]]
[5]
C. Bastoul and P. Feautrier. More legal transformations for locality. In Euro-Par'10, number 3149 in LNCS, pages 272--283, Pisa, Aug. 2004.]]
[6]
C. Bell, W.-Y. Chen, D. Bonachea, and K. Yelick. Evaluating support for global address space languages on the cray X1. In ACM Int. Conf. on Supercomputing (ICS'04), St-Malo, France, June 2004.]]
[7]
W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoefinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Parallel programming with Polaris. IEEE Computer, 29(12):78--82, Dec. 1996.]]
[8]
F. Chow. Maximizing application performance through interprocedural optimization with the pathscale eko compiler suite. http://www.pathscale.com/whitepapers.html, Aug. 2004.]]
[9]
C. Coarfa, F. Zhao, N. Tallent, J. Mellor-Crummey, and Y. Dotsenko. Open-source compiler technology for source-to-source optimization. http://www.cs.rice.edu/~johnmc/research.html (project page).]]
[10]
A. Cohen, S. Girbal, and O. Temam. A polyhedral approach to ease the composition of program transformations. In Euro-Par'04, number 3149 in LNCS, Pisa, Italy, Aug. 2004. Springer-Verlag.]]
[11]
K. D. Cooper, M. W. Hall, R. T. Hood, K. Kennedy, K. S. McKinley, J. M. Mellor-Crummey, L. Torczon, and S. K. Warren. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, 1993.]]
[12]
K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. J. of Supercomputing, 2002.]]
[13]
P. Feautrier. Some efficient solutions to the affine scheduling problem, part II, multidimensional time. Int. J. of Parallel Programming, 21(6):389--420, Dec. 1992. See also Part I, one dimensional time, 21(5):315--348.]]
[14]
M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, Dec. 1996.]]
[15]
F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: An overview of the pips project. In ACM Int. Conf. on Supercomputing (ICS'2), Cologne, Germany, June 1991.]]
[16]
KAP C/OpenMP for Tru64 UNIX and KAP DEC Fortran for Digital UNIX. http://www.hp.com/techsevers/software/kap.html.]]
[17]
W. Kelly. Optimization within a unified transformation framework. Technical Report CS-TR-3725, University of Maryland, 1996.]]
[18]
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers'95 Symp. on the frontiers of massively parallel computation, McLean 1995.]]
[19]
T. Kisuki, P. Knijnenburg, K. Gallivan, and M. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Parallel Architectures and Compilation Techniques (PACT'00). IEEE Computer Society Press, Oct. 2001.]]
[20]
T. Kisuki, P. Knijnenburg, M. O'Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proc. CPC'10 (Compilers for Parallel Computers), pages 35--44, 2000.]]
[21]
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Intl. J of Parallel Programming, 22(2):183--205, April 1994.]]
[22]
A. W. Lim and M. S. Lam. Communication-free parallelization via affine transformations. In 24th ACM Symp. on Principles of Programming Languages, pages 201--214, Paris, France, jan 1997.]]
[23]
A. W. Lim, S.-W. Liao, and M. S. Lam. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In ACM Symp. on Principles and Practice of Parallel Programming (PPoPP'01), pages 102--112, 2001.]]
[24]
S. Long and M. O'Boyle. Adaptive java optimisation using instance-based learning. In ACM Int. Conf. on Supercomputing (ICS'04), pages 237--246, St-Malo, France, June 2004.]]
[25]
M. O'Boyle. MARS: a distributed memory approach to shared memory compilation. In Proc. Language, Compilers and Runtime Systems for Scalable Computing, Pittsburgh, May 1998, Springer-Verlag.]]
[26]
M. O'Boyle, P. Knijnenburg, and G. Fursin. Feedback assisted iterative compiplation. In Proc. LCR, 2000.]]
[27]
Open research compiler. http://ipf-orc.sourceforge.net.]]
[28]
D. Parello, O. Temam, A. Cohen, and J.-M. Verdun. Towards a systematic, pragmatic and architecture-aware program optimization process for complex processors. In ACM Supercomputing'04, Pittsburgh, Pennsylvania, Nov. 2004.]]
[29]
D. Parello, O. Temam, and J.-M. Verdun. On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance? matrix-multiply revisited. In SuperComputing'02, Baltimore, Maryland, Nov. 2002.]]
[30]
G.-R. Perrin and A. Darte, editors. The Data Parallel Programming model. Number 1132 in LNCS. Springer-Verlag, 1996.]]
[31]
A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Four generations of SPEC CPU benchmarks: what has changed and what has not. Technical Report TR-041026-01-1, University of Texas Austin, 2004.]]
[32]
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, Oct. 2000.]]
[33]
Standard performance evaluation corp. http://www.spec.org.]]
[34]
M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, Aug. 1992. Published as CSL-TR-92-538.]]
[35]
M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.]]
[36]
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model-driven optimization. In ACM Symp. on Programming Language Design and Implementation (PLDI'03), San Diego, CA, June 2003.]]

Cited By

View all
  • (2025)The MLIR Transform DialectProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708922(241-254)Online publication date: 1-Mar-2025
  • (2025)Optimizing Data Reuse for Loop Mapping on CGRAs With Joint Affine and Nonaffine TransformationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.345197844:3(989-1002)Online publication date: Mar-2025
  • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS05
Sponsor:
ICS05: International Conference on Supercomputing 2005
June 20 - 22, 2005
Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)The MLIR Transform DialectProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708922(241-254)Online publication date: 1-Mar-2025
  • (2025)Optimizing Data Reuse for Loop Mapping on CGRAs With Joint Affine and Nonaffine TransformationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.345197844:3(989-1002)Online publication date: Mar-2025
  • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
  • (2021)Intermediate Representations for Explicitly Parallel ProgramsACM Computing Surveys10.1145/345229954:5(1-24)Online publication date: 25-May-2021
  • (2021)Towards a domain-extensible compilerProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370337(27-38)Online publication date: 27-Feb-2021
  • (2021)Towards parallelism detection of sequential programs with graph neural networkFuture Generation Computer Systems10.1016/j.future.2021.07.001125:C(515-525)Online publication date: 1-Dec-2021
  • (2020)AutoParallel: Automatic parallelisation and distributed execution of affine loop nests in PythonThe International Journal of High Performance Computing Applications10.1177/1094342020937050(109434202093705)Online publication date: 14-Jul-2020
  • (2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media