Skip to main content
Log in

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. However, these transformations are not independent and each can adversely affect the goal of the other. Furthermore, the best combination will vary dramatically from one processor to the next. In this paper, we therefore address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies based on genetic algorithms, random sampling and simulated annealing. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached. In this way, we can reach high levels of optimization within 50 iterations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pp. 1–30. Prentice-Hall, Englewood Cliffs, 1972.

    Google Scholar 

  2. M. Barreteau, F. Bodin, Z. Chamski, H.-P. Charles, C. Eisenbeis, J. Gurd, J. Hoogerbrugge, P. Hu, W. Jalby, T. Kisuki, P.M. W. Knijnenburg, P. van der Mark, A. Nisbet, M. F. P. O'Boyle, E. Rohou, A. Seznec, E. A. Stöhr, M. Treffers, and H. A. G. Wijshoff. OCEANS: Optimizing compilers for embedded applications. In P. Amestoy et al., ed., Proc. Euro-Par 99, volume 1685 of Lecture Notes in Computer Science, pp. 1171–1175. Springer Verlag, Berlin, 1999.

    Google Scholar 

  3. A. J. C. Bik, P. J. Brinkhaus, P.M. W. Knijnenburg, and H. A. G. Wijshoff. Transformation mechanisms in MT1. Technical Report 2000-21, LIACS, Leiden University, Leiden, 2000.

    Google Scholar 

  4. A. J. C. Bik and H. A. G. Wijshoff. MT1: A prototype restructuring compiler. Technical Report 93-32, Department of Computer Science, Leiden University, Leiden 1993.

    Google Scholar 

  5. J. Bilmes, K. Asanovi?, C. W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proc. International Conference on Supercomputing, pp. 340–347, ACM Press, New York, 1997.

    Google Scholar 

  6. F. Bodin, T. Kisuki, P.M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimization space. In Proc. ACM Workshop on Profile and Feedback Directed Compilation, 1998. Organized in conjunction with PACT98, Paris, France.

  7. S. Carr. Combining optimization for cache and instruction level parallelism. In Proc. Conference on Parallel Architectures and Compilation Techniques, pp. 238–247. IEEE Computer Society Press, Los Alamitos, Calif., 1996.

    Google Scholar 

  8. S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768–1810, 1994.

    Google Scholar 

  9. K. Chow and Y. Wu. Feedback-directed selection and characterization of compiler optimizatons. In Proc. 2nd Workshop on Feedback Directed Optimization, Haifa, 1999. Organized in conjunction with MICRO32.

  10. R. Cohn and P. G. Lowney. Feedback directed optimization in Compaq's compilation tools for Alpha. In Proc. 2nd Workshop on Feedback Directed Optimization, Haifa, 1999. Organized in conjunction with MICRO32.

  11. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 279–290. ACM Press, New York, 1995.

    Google Scholar 

  12. H. Corporaal. Microprocessor Architectures: From VLIW to TTA. John Wiley, New York, 1997.

    Google Scholar 

  13. G. de Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill, New York, 1994.

    Google Scholar 

  14. D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. J. Parallel and Distributed Computing, 5:587–616, 1988.

    Google Scholar 

  15. S. Gosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. on Programming Languages and Systems, 21(4):703–746, 1999.

    Google Scholar 

  16. H. Han, G. Rivera, and C.-W. Tseng. Software support for improving locality in scientific codes. In Proc. Compilers for Parallel Computers, pp. 213–228, Aussois, 2000.

  17. W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Cahng, N. J. Warter, R. A. Bringman, R. G. Oullette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 7(1/2):229–248, 1993.

    Google Scholar 

  18. T. Kisuki, P.M. W. Knijnenburg, and M. F. P. O'Boyle. Iterative compilation for tile sizes and unroll factors: Implementation, performance, search strategies. Technical Report TR2000-06, LIACS, Leiden University, Leiden, 2000.

    Google Scholar 

  19. T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, F. Bodin, and H. A. G. Wijshoff. A feasibility study in iterative compilation. In Proc. International Symposium on High Performance Computing, volume 1615 of Lecture Notes in Computer Science, pp. 121–132. Springer Verlag, Berlin, 1999.

    Google Scholar 

  20. T. Kisuki, P.M. W. Knijnenburg, M. F. P. O'Boyle, and H. A. G. Wijshoff. Iterative compilation in program optimization. In Proc. Compilers for Parallel Computers, pp. 35–44, Aussois, 2000.

  21. P. M. W. Knijnenburg, T. Kisuki, K. Gallivan, and M. F. P. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Proc. 3rd ACM Workshop on Profile Directed and Dynamic Optimization, pp. 31–40, Monterey, 2000. Organized in conjunction with MICRO-33.

  22. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63–74. ACM Press, New York, 1991.

    Google Scholar 

  23. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th Internationsl Symposium on Microarchitecture, pp. 45–54. IEEE Computer Society Press, Los Alamitos, Calif., 1992.

    Google Scholar 

  24. M. Mock, M. Berryman, C. Chambers, and S. J. Eggers. Calpa: A tool for automating dynamic compilation. In Proc. 2nd Workshop on Feedback Directed Optimization, 1999. Organized in conjunction with MICRO32, Paris, France.

  25. S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, 1997.

    Google Scholar 

  26. A. Nisbet. GAPS: Genetic algorithm optimised parallelization. In Proc. Workshop on Profile and Feedback Directed Compilation, Paris, 1998. Organized in conjuction with PACT98.

  27. M. F. P. O'Boyle and P.M. W. Knijnenburg. Efficient parallelization using combined loop and data transformations. In Proc. IEEE International Conference on Parallel Architectures and Compilation Techniques, pp. 283–291. IEEE Computer Society Press, Los Alamitos, Calif., 1999.

    Google Scholar 

  28. M. F. P. O'Boyle, P.M. W. Knijnenburg, T. Kisuki, and G. Fursin. Evaluating iterative compilation in massive optimization spaces. Preprint, University of Edinburgh, 2001.

  29. G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proc. 8th International Conference on Compiler Construction, Lecture Notes in Computer Science. Springer Verlag, Berlin, 1999.

    Google Scholar 

  30. P. van der Mark, E. Rohou, F. Bodin, Z. Chamski, and C. Eisenbeis. Using iterative compilation for managing software pipeline—unrolling tradeoffs. In Proc. 4th International Workshop on Software and Compilers for Embedded Systems (SCOPES99), 1999.

  31. R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. Technical Report UT-CS-97-366, University of Tennessee, TN, 1997.

    Google Scholar 

  32. M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. International Journal of Parallel Programming, 26(4):479–503, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knijnenburg, P.M.W., Kisuki, T. & O'Boyle, M.F.P. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation. The Journal of Supercomputing 24, 43–67 (2003). https://doi.org/10.1023/A:1020989410030

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020989410030

Navigation