Abstract
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.
Supported in part by the National Science Foundation through the Information Technology Research program (CHE-0121676 and CHE-0121706), by NSF grant CCF-0073800 and by a grant from the Environmental Protection Agency.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggregate Remote Memory Copy Interface, http://www.emsl.pnl.gov/docs/parsoft/armci/
Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and Computation Transformations for Multiprocessors. In: Proc. of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing (July 1995)
Baumgartner, G., Bernholdt, D.E., Cociorva, D., Harrison, R., Hirata, S., Lam, C., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P.: A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry. In: Proc. of SC 2002 (November 2002)
Cannon, L.: A Cellular Computer to Implement the Kalman Filter Algorithm. PhD thesis, Montana State University (1969)
Cierniak, M., Li, W.: Unifying data and control transformations for distributed shared memory machines. In: ACM SIGPLAN IPDPS, pp. 205–217 (1995)
Cociorva, D., Baumgartner, G., Lam, C., Ramanujam, J., Sadayappan, P., Nooijen, M., Bernholdt, D., Harrison, R.: Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations. In: Proc. of ACM SIGPLAN PLDI 2002, pp. 177–186 (2002)
Cociorva, D., Gao, X., Krishnan, S., Baumgartner, G., Lam, C., Sadayappan, P., Ramanujam, J.: Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints. In: Proc. of IPDPS (2003)
Dongarra, J.J., Croz, J.D., Duff, I.S., Hammarling, S.: A set of level-3 basic linear algebra subprograms. ACM Transactions on Mathematical Software 16(1), 1–17 (1990)
Van De Geijn, R.A., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9(4), 255–274 (1997)
Intel Math Kernel Library, http://www.intel.com/software/products/mkl/features.htm
Ju, Y., Dietz, H.: Reduction of cache coherence overhead by compiler data layout and loop transformation. In: Proc. of LCPC, pp. 344–358. Springer, Heidelberg (1992)
Kandemir, M., Banerjee, P., Choudhary, A., Ramanujam, J., Ayguade, E.: Static and dynamic locality optimizations using integer linear programming. IEEE Transactions on Parallel and Distributed Systems 12(9), 922–941 (2001)
Kandemir, M., Choudhary, A., Ramanujam, J., Banerjee, P.: Improving locality using loop and data transformations in an integrated framework. In: International Symposium on Microarchitecture, pp. 285–297 (1998)
Kandemir, M., Choudhary, A., Shenoy, N., Banerjee, P., Ramanujam, J.: A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems 10(2), 115–135 (1999)
Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Crummey, J.M., Torczon, L.: Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries. JPDC 61(12), 1803–1826 (2001)
Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C., Sadayappan, P., Ramanujam, J., Bernholdt, D.E., Choppella, V.: Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS (LNAI), vol. 2913, pp. 406–417. Springer, Heidelberg (2003)
Lee, T.J., Scuseria, G.E.: Achieving chemical accuracy with coupled cluster theory. In: Langhoff, S.R. (ed.) Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy, pp. 47–109. Kluwer Academic Publishers, Dordrecht (1997)
Leung, S., Zahorjan, J.: Optimizing data locality by array restructuring. Technical Report TR-95-09-01, Dept. Computer Science, University of Washington, Seattle, WA (1995)
Frigo, M., Johnson, S.: FFTW: An adaptive software architecture for the FFT. In: Proc. of ICASSP 1998, vol. 3, pp. 1381–1384 (1998)
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a portable programming model for distributed memory computers. In: Supercomputing, pp. 340–349 (1994)
O’Boyle, M.F.P., Knijnenburg, P.M.W.: Non-singular data transformations: definition, validity, applications. In: Proc. of CPC1996, pp. 287–297 (1996)
Whaley, R., Dongarra, J.: Automatically Tuned Linear Algebra Software (ATLAS). In: Proc. of Supercomputing 1998 (1998)
Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. SIGPLAN Not. 38(5), 63–76 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, Q., Gao, X., Krishnamoorthy, S., Baumgartner, G., Ramanujam, J., Sadayappan, P. (2005). Empirical Performance-Model Driven Data Layout Optimization. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_7
Download citation
DOI: https://doi.org/10.1007/11532378_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28009-5
Online ISBN: 978-3-540-31813-2
eBook Packages: Computer ScienceComputer Science (R0)