Abstract
In many cases, simple analytical models used by traditional compilers are no longer able to yield effectively optimized code for complex programs because of the enormous complexity of processor architectures. A promising alternative approach for optimizing applications effectively has been the use of search-based empirical methods. The success of empirically tuned library generators such as ATLAS has shown that this strategy can be effective for domain-specific programs. However, to date there has been no general-purpose tool for effective empirical optimization of whole programs. The main obstacle to this approach has been the need for evaluating a prohibitively large number of alternative program variants. To address this problem, we have developed a prototype tool for automatic application tuning that uses loop-level performance feedback and a direct search strategy to guide search for the best set of optimization parameters. Experiments on four different architectures show that direct search can be an effective technique for finding good values for transformation parameters in a reasonable time.
References
Bilmes J, Asanovic K, Chen C-W, Demmel J (1997) Optimizing matrix multiply using phipac: a portable high-performance ansi-c coding methodology. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria
Carr S, Kennedy K (1994) Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 16(6):1768–1810
Cooper K, Subramanian D, Torczon L (2001) Adapative optimizing compilers for the 21st century. In Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM
Frigo M (1998) A fast fourier transform compiler. In Proceedings of the SIGPLAN ’98 Conference on Programming Language Design and Implementation, Montreal, Canada
Fursin GG, O’Boyle MFP, Knijnenburg PMW (2002) Evaluating iterative compilation. In Proceedings of the Fifteenth International Workshop on Languages and Compilers for Parallel Computing, College Park, Maryland
Hooke R, Jeeves TA (1961) Direct search solution of numerical and statistical problems. In Journal of the ACM pp 212–229
Kisuki T, Knijnenburg P (2003) Combined selection of tile sizes and unroll factors using iterative compilation. The Journal of Supercomputing 24(1):43–67
Knijnenburg P, Kisuki T, Boyle MO (2002) Iterative compilation. In Embedded Processor Design Challenges–-System Architecture, Modeling and Simulation (SAMOS), Lecture Notes in Computer Science 2268, Springer Verlag pp 171–187,
Kulkarni P, Hines S, Hiser J, Whalley D, Davidson J, Jones D (2004) Fast searches for effective optimization phase sequences. In Proceedings of the SIGPLAN ’04 Conference on Programming Language Design and Implementation. Washington, DC
Lewis RM, Torczon V, Trosset MW (2000) Direct search methods: then and now. Journal of Computational and Applied Mathematics 124(1–2):191–207
Mellor-Crummey J, Fowler R, Marin G, Tallent N (2002) HPCView: a tool for top-down analysis of node performance. Special Issue with selected papers from the Los Alamos Computer Science Institute Symposium, (In press)
Qasem A, Jin G, Mellor-Crummey J (2003) Improving performance with integrated program transformations. Technical Report CS-TR03-419, Dept. of Computer Science Rice University
Torczon V (1989) Multi-directional search: A Direct Search Algorithm for Parallel Machines. PhD thesis, Dept. of Computer Science, Rice University
Triantafyllis S, Vachharajani M, Vachharajani N, August D (2003) Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. San Fransisco, CA
Whaley C, Dongarra J (1998) Automatically tuned linear algebra software. In Proceedings of SC’98: High Performance Networking and Computing. Orlando, FL
Wolf M, Maydan D, Chen D (1996) Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual International Symposium on MicroArchitecture. pp 274–286
Wolfe MJ (1987) Iteration space tiling for memory hierarchies. Extended version of a paper which appeared in proceedings of the Third SIAM Conference on Parallel Processing
Xiong J, Johnson J, Johnson R, Padua D (2001) SPL: A Language and Compiler for DSP algorithms. In Proceedings of the SIGPLAN ’01 Conference on Programming Language Design and Implementation Snowbird, Utah
Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P, Wu P (2003) A comparison of empirical and model-driven optimization. In Proceedings of the SIGPLAN ’03 Conference on Programming Language Design and Implementation. San Diego, CA
Author information
Authors and Affiliations
Corresponding author
Additional information
This material is based on work supported by the Department of Energy under Contract Nos. 03891-001-99-4G, 74837-001-03 49, 86192-001-04 49, and/or 12783-001-05 49 from the Los Alamos National Laboratory.
Rights and permissions
About this article
Cite this article
Qasem, A., Kennedy, K. & Mellor-Crummey, J. Automatic tuning of whole applications using direct search and a performance-based transformation system. J Supercomput 36, 183–196 (2006). https://doi.org/10.1007/s11227-006-7957-2
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-7957-2