Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

Knijnenburg, P. M. W.; Kisuki, T.; O'Boyle, M. F. P.

doi:10.1023/A:1020989410030

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

Published: January 2003

Volume 24, pages 43–67, (2003)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

P. M. W. Knijnenburg¹,
T. Kisuki¹ &
M. F. P. O'Boyle²

184 Accesses
23 Citations
Explore all metrics

Abstract

Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary dramatically from one processor to the next. In this paper, we therefore address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies based on genetic algorithms, random sampling and simulated annealing. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached. In this way, we can reach high levels of optimization within 50 iterations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Article 09 January 2017

An Effective Framework of Program Optimization for High Performance Computing

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

References

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pp. 1–30. Prentice-Hall, Englewood Cliffs, 1972.
Google Scholar
M. Barreteau, F. Bodin, Z. Chamski, H.-P. Charles, C. Eisenbeis, J. Gurd, J. Hoogerbrugge, P. Hu, W. Jalby, T. Kisuki, P.M. W. Knijnenburg, P. van der Mark, A. Nisbet, M. F. P. O'Boyle, E. Rohou, A. Seznec, E. A. Stöhr, M. Treffers, and H. A. G. Wijshoff. OCEANS: Optimizing compilers for embedded applications. In P. Amestoy et al., ed., Proc. Euro-Par 99, volume 1685 of Lecture Notes in Computer Science, pp. 1171–1175. Springer Verlag, Berlin, 1999.
Google Scholar
A. J. C. Bik, P. J. Brinkhaus, P.M. W. Knijnenburg, and H. A. G. Wijshoff. Transformation mechanisms in MT1. Technical Report 2000-21, LIACS, Leiden University, Leiden, 2000.
Google Scholar
A. J. C. Bik and H. A. G. Wijshoff. MT1: A prototype restructuring compiler. Technical Report 93-32, Department of Computer Science, Leiden University, Leiden 1993.
Google Scholar
J. Bilmes, K. Asanovi?, C. W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proc. International Conference on Supercomputing, pp. 340–347, ACM Press, New York, 1997.
Google Scholar
F. Bodin, T. Kisuki, P.M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimization space. In Proc. ACM Workshop on Profile and Feedback Directed Compilation, 1998. Organized in conjunction with PACT98, Paris, France.
S. Carr. Combining optimization for cache and instruction level parallelism. In Proc. Conference on Parallel Architectures and Compilation Techniques, pp. 238–247. IEEE Computer Society Press, Los Alamitos, Calif., 1996.
Google Scholar
S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768–1810, 1994.
Google Scholar
K. Chow and Y. Wu. Feedback-directed selection and characterization of compiler optimizatons. In Proc. 2nd Workshop on Feedback Directed Optimization, Haifa, 1999. Organized in conjunction with MICRO32.
R. Cohn and P. G. Lowney. Feedback directed optimization in Compaq's compilation tools for Alpha. In Proc. 2nd Workshop on Feedback Directed Optimization, Haifa, 1999. Organized in conjunction with MICRO32.
S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 279–290. ACM Press, New York, 1995.
Google Scholar
H. Corporaal. Microprocessor Architectures: From VLIW to TTA. John Wiley, New York, 1997.
Google Scholar
G. de Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, New York, 1994.
Google Scholar
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. J. Parallel and Distributed Computing, 5:587–616, 1988.
Google Scholar
S. Gosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. on Programming Languages and Systems, 21(4):703–746, 1999.
Google Scholar
H. Han, G. Rivera, and C.-W. Tseng. Software support for improving locality in scientific codes. In Proc. Compilers for Parallel Computers, pp. 213–228, Aussois, 2000.
W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Cahng, N. J. Warter, R. A. Bringman, R. G. Oullette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 7(1/2):229–248, 1993.
Google Scholar
T. Kisuki, P.M. W. Knijnenburg, and M. F. P. O'Boyle. Iterative compilation for tile sizes and unroll factors: Implementation, performance, search strategies. Technical Report TR2000-06, LIACS, Leiden University, Leiden, 2000.
Google Scholar
T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, F. Bodin, and H. A. G. Wijshoff. A feasibility study in iterative compilation. In Proc. International Symposium on High Performance Computing, volume 1615 of Lecture Notes in Computer Science, pp. 121–132. Springer Verlag, Berlin, 1999.
Google Scholar
T. Kisuki, P.M. W. Knijnenburg, M. F. P. O'Boyle, and H. A. G. Wijshoff. Iterative compilation in program optimization. In Proc. Compilers for Parallel Computers, pp. 35–44, Aussois, 2000.
P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Proc. 3rd ACM Workshop on Profile Directed and Dynamic Optimization, pp. 31–40, Monterey, 2000. Organized in conjunction with MICRO-33.
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63–74. ACM Press, New York, 1991.
Google Scholar
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th Internationsl Symposium on Microarchitecture, pp. 45–54. IEEE Computer Society Press, Los Alamitos, Calif., 1992.
Google Scholar
M. Mock, M. Berryman, C. Chambers, and S. J. Eggers. Calpa: A tool for automating dynamic compilation. In Proc. 2nd Workshop on Feedback Directed Optimization, 1999. Organized in conjunction with MICRO32, Paris, France.
S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, 1997.
Google Scholar
A. Nisbet. GAPS: Genetic algorithm optimised parallelization. In Proc. Workshop on Profile and Feedback Directed Compilation, Paris, 1998. Organized in conjuction with PACT98.
M. F. P. O'Boyle and P.M. W. Knijnenburg. Efficient parallelization using combined loop and data transformations. In Proc. IEEE International Conference on Parallel Architectures and Compilation Techniques, pp. 283–291. IEEE Computer Society Press, Los Alamitos, Calif., 1999.
Google Scholar
M. F. P. O'Boyle, P.M. W. Knijnenburg, T. Kisuki, and G. Fursin. Evaluating iterative compilation in massive optimization spaces. Preprint, University of Edinburgh, 2001.
G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proc. 8th International Conference on Compiler Construction, Lecture Notes in Computer Science. Springer Verlag, Berlin, 1999.
Google Scholar
P. van der Mark, E. Rohou, F. Bodin, Z. Chamski, and C. Eisenbeis. Using iterative compilation for managing software pipeline—unrolling tradeoffs. In Proc. 4th International Workshop on Software and Compilers for Embedded Systems (SCOPES99), 1999.
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical Report UT-CS-97-366, University of Tennessee, TN, 1997.
Google Scholar
M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. International Journal of Parallel Programming, 26(4):479–503, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, the Netherlands
P. M. W. Knijnenburg & T. Kisuki
Institute for Computing Systems Architecture, Edinburgh University, Mayfield Road, Edinburgh, EH9 3JZ, UK
M. F. P. O'Boyle

Authors

P. M. W. Knijnenburg
View author publications
You can also search for this author in PubMed Google Scholar
T. Kisuki
View author publications
You can also search for this author in PubMed Google Scholar
M. F. P. O'Boyle
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knijnenburg, P.M.W., Kisuki, T. & O'Boyle, M.F.P. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation. The Journal of Supercomputing 24, 43–67 (2003). https://doi.org/10.1023/A:1020989410030

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1020989410030

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

Abstract

Access this article

Similar content being viewed by others

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

An Effective Framework of Program Optimization for High Performance Computing

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

Abstract

Access this article

Similar content being viewed by others

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

An Effective Framework of Program Optimization for High Performance Computing

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation