Iterative Compilation with Kernel Exploration

Barthou, D.; Donadio, S.; Duchateau, A.; Jalby, W.; Courtois, E.

doi:10.1007/978-3-540-72521-3_14

D. Barthou²,
S. Donadio^1,2,
A. Duchateau²,
W. Jalby² &
…
E. Courtois³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4382))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

572 Accesses
1 Citations

Abstract

The increasing complexity of hardware mechanisms for recent processors makes high performance code generation very challenging. One of the main issue for high performance is the optimization of memory accesses. General purpose compilers, with no knowledge of the application context and approximate memory model, seem inappropriate for this task. Combining application-dependent optimizations on the source code and exploration of optimization parameters as it is achieved with ATLAS, has been shown as one way to improve performance. Yet, hand-tuned codes such as in the MKL library still outperform ATLAS with an important speed-up and some effort has to be done in order to bridge the gap between performance obtained by automatic and manual optimizations.

In this paper, a new iterative compilation approach for the generation of high performance codes is proposed. This approach is not application-dependent, compared to ATLAS. The idea is to separate the memory optimization phase from the computation optimization phase. The first step automatically finds all possible decompositions of the code into kernels. With datasets that fit into the cache and simplified memory accesses, these kernels are simpler to optimize, either with the compiler, at source level, or with a dedicated code generator. The best decomposition is then found by a model-guided approach, performing on the source code the required memory optimizations.

Exploration of optimization sequences and their parameters is achieved with a meta-compilation language, X language. The first results on linear algebra codes for Itanium show that the performance obtained reduce the gap with those of highly optimized hand-tuned codes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

KART – A Runtime Compilation Library for Improving HPC Application Performance

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Article 09 January 2017

Fast Heuristic-Based GPU Compiler Sequence Specialization

References

Alias, C., Barthou, D.: On Domain Specific Languages Re-Engineering. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 63–77. Springer, Heidelberg (2005)
Chapter Google Scholar
Bodin, F., Mevel, Y., Quiniou, R.: A user level program transformation tool. In: ACM Int. Conf. on Supercomputing, Melbourne, Australia, pp. 180–187. ACM Press, New York (1998), doi:10.1145/277830.277868
Google Scholar
Clauss, P.: Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: Applications to analyze and transform scientific programs. In: ACM Int. Conf. on Supercomputing, pp. 278–295. ACM Press, New York (1996)
Google Scholar
Coleman, S., McKinley, K.S.: Tile size selection using cache organization and data layout. In: ACM Conf. on Programming Language Design and Implementation, La Jolla, California, United States, pp. 279–290. ACM Press, New York (1995), doi:10.1145/207110.207162
Chapter Google Scholar
Cooper, K.D., Waterman, T.: Investigating Adaptive Compilation using the MIPSPro Compiler. In: Symp. of the Los Alamos Computer Science Institute, October (2003)
Google Scholar
Djoudi, L., et al.: Exploring application performance: a new tool for a static/dynamic approach. In: Symp. of the Los Alamos Computer Science Institute, Santa Fe, NM, Oct. (2005)
Google Scholar
Donadio, S., et al.: A language for the Compact Representation of Multiple Program Versions. In: Ayguadé, E., et al. (eds.) LCPC 2005. LNCS, vol. 4339, Springer, Heidelberg (2006)
Chapter Google Scholar
Engineering and scientific subroutine library. Guide and Reference. IBM.
Google Scholar
Feautrier, P.: Dataflow analysis of scalar and array references. Int. J. of Parallel Programming 20(1), 23–53 (1991)
Article MATH Google Scholar
Fraguela, B., Doallo, R., Zapata, E.: Automatic analytical modeling for the estimation of cache misses. In: Int. Conf. on Parallel Architectures and Compilation Techniques, Washington, DC, USA, p. 221. IEEE Computer Society Press, Los Alamitos (1999)
Google Scholar
Goto, K., van de Geijn, R.: On reducing tlb misses in matrix multiplication. Technical report, The University of Texas at Austin, Department of Computer Sciences (2002)
Google Scholar
Jalby, W., Lemuet, C., Le Pasteur, X.: Wbtk: a new set of microbenchmarks to explore memory system performance for scientific computing. Int. J. High Perform. Comput. Appl. 18(2), 211–224 (2004), doi:10.1177/1094342004038945
Article Google Scholar
Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: ACM Conf. on Programming Language Design and Implementation, pp. 346–357. ACM, New York (1997), citeseer.ist.psu.edu/kodukula97datacentric.html
Google Scholar
Kodukula, I., Pingali, K.: Transformations for imperfectly nested loops. In: ACM Int. Conf. on Supercomputing, Pittsburgh, Pennsylvania, United States, p. 12. IEEE Computer Society, Washington (1996), doi:10.1145/369028.369051
Google Scholar
Metzger, R., Wen, Z.: Automatic Algorithm Recognition: A New Approach to Program Optimization. MIT Press, Cambridge (2000)
Google Scholar
Intel math kernel library (intel mkl). Intel.
Google Scholar
Triantafyllis, S., Vachharajani, M., August, D.I.: Compiler Optimization-Space Exploration. Journal of Instruction-level Parallelism (2005)
Google Scholar
Whaley, R., Dongarra, J.: Automatically tuned linear algebra software (1997)
Google Scholar
Wolfe, M.: Iteration space tiling for memory hierarchies. In: Conf. on Parallel Processing for Scientific Computing, pp. 357–361. Society for Industrial and Applied Mathematics, Philadelphia (1989)
Google Scholar
Caps entreprise. http://www.caps-entreprise.com
Yotov, K., et al.: Is search really necessary to generate high-performance blas (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Bull SA Company, France
S. Donadio
Université de Versailles, France
D. Barthou, S. Donadio, A. Duchateau & W. Jalby
CAPS Entreprise, France
E. Courtois

Authors

D. Barthou
View author publications
You can also search for this author in PubMed Google Scholar
S. Donadio
View author publications
You can also search for this author in PubMed Google Scholar
A. Duchateau
View author publications
You can also search for this author in PubMed Google Scholar
W. Jalby
View author publications
You can also search for this author in PubMed Google Scholar
E. Courtois
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

George Almási Călin Caşcaval Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barthou, D., Donadio, S., Duchateau, A., Jalby, W., Courtois, E. (2007). Iterative Compilation with Kernel Exploration. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-72521-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72520-6
Online ISBN: 978-3-540-72521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Iterative Compilation with Kernel Exploration

Abstract

Access this chapter

Preview

Similar content being viewed by others

KART – A Runtime Compilation Library for Improving HPC Application Performance

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Fast Heuristic-Based GPU Compiler Sequence Specialization

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Iterative Compilation with Kernel Exploration

Abstract

Access this chapter

Preview

Similar content being viewed by others

KART – A Runtime Compilation Library for Improving HPC Application Performance

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Fast Heuristic-Based GPU Compiler Sequence Specialization

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation