Abstract
We present a study of implementations of DGEMM using both the cache-oblivious and cache-conscious programming styles. The cache-oblivious programs use recursion and automatically block DGEMM operands A,B,C for the memory hierarchy. The cache-conscious programs use iteration and explicitly block A,B,C for register files, all caches and memory. Our study shows that the cache-oblivious programs achieve substantially less performance than the cache-conscious programs. We discuss why this is so and suggest approaches for improving the performance of cache-oblivious programs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal 5(2), 78–101 (1966)
Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)
Frigo, M., Leiserson, C., Prokop, H., Ramachandran, S.: Cache-oblivious Algorithms. In: FOCS 1999: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 285. IEEE Computer Society Press, Los Alamitos (1999)
Dongarra, J.J., Gustavson, F.G., Karp, A.: Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine. SIAM Review 26(1), 91–112 (1984)
Elmroth, E., Gustavson, F.G., Kågström, B., Jonsson, I.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)
Hong, J.-W., Kung, H.T.: I/O complexity: The red-blue pebble game. In: Proc. of the thirteenth annual ACM symposium on Theory of computing, pp. 326–333 (1981)
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: A Family of High-Performance Matrix Multiplication Algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 256–265. Springer, Heidelberg (2006)
Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)
Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)
Gustavson, F.G., Gunnels, J.A., Sexton, J.C.: Minimal Data Copy for Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E. (eds.) Computational Science - Para 2006. LNCS, vol. xxxx, pp. 540–549. Springer, Heidelberg (2006)
Gustavson, F.G., Henriksson, A., Jonsson, I., Kågström, B., Ling, P.: Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Kagström, B., Elmroth, E., Waśniewski, J., Dongarra, J.J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 195–206. Springer, Heidelberg (1998)
Gustavson, F.G., Henriksson, A., Jonsson, I., Kågström, B., Ling, P.: Superscalar GEMM-based level 3 BLAS—the on-going evolution of a portable and high-performance library. In: Kagström, B., Elmroth, E., Waśniewski, J., Dongarra, J.J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 207–215. Springer, Heidelberg (1998)
Park, N., Hong, B., Prasanna, V.K.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)
Roeder, T., Yotov, K., Pingali, K., Gunnels, J., Gustavson, F.: The Price of Cache Obliviousness. Department of Computer Science, University of Texas, Austin Technical Report CS-TR-06-43 (September 2006)
Sinharoy, B., Kalla, R.N., Tendler, J.M, Kovacs, R.G., Eickemeyer, R.J., Joyner, J.B.: POWER5 System Microarchitecture. IBM Journal of Research and Development 49(4/5), 505–521 (2005)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing (1-2), 3–35 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gunnels, J.A., Gustavson, F.G., Pingali, K., Yotov, K. (2007). Is Cache-Oblivious DGEMM Viable?. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_109
Download citation
DOI: https://doi.org/10.1007/978-3-540-75755-9_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)