Abstract
Current many-core architectures (MCA) have much larger arithmetic to memory bandwidth ratio compared with traditional processors (vector, superscalar, and multi-core, etc). As a result, bandwidth has become an important performance bottleneck of MCA. Previous works have demonstrated promising performance of MCA for dense matrix operations. However, there is still little quantitative understanding of the relationship between performance of matrix computation kernels and the limited memory bandwidth. This paper presents a performance model for dense matrix multiplication (MM), LU and Cholesky decomposition. The input parameters are memory bandwidth B and on-chip SRAM capacity C, while the output is maximum core number P max . We show that \(P_{max}=\Theta(B\ast \sqrt{C})\). P max indicates that when the problem size is large enough, the given memory bandwidth will not be a performance bottleneck as long as the number of cores P < P max . The model is validated by a comparison between the theoretical performance and experimental data of previous works.
Chapter PDF
References
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The Landscape of Parallel Computing Research: A View from Berkeley
Zhu, W.R., Sreedhar, V.C., Aang Hu, Z., Gao, G.R.: Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization for Many-Core Architectures. In: Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, USA, June 9-13 (2007)
Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., Borkar, N.: An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference, February 11-15 (2007)
Dally, W.J., Labonte, F., Das, A., Hanrahan, P., Ahn, J.H., Gummaraju, J., Erez, M., Jayasena, N., Buck, I., Knight, T.J., Kapasi, U.J.: Merrimac: Supercomputing with Streams. In: Proceedings of the Supercomputer Conference, November 15-21 (2003)
Tan, G., Fan, D., Zhang, J., Russo, A., Gao, G.R.: Experience on Optimizing Irregular Computation for Memory Hierarchy in Manycore Architecture. In: The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, February 20-23 (2008)
Ang Hu, Z., del Cuvillo, J., Zhu, W., Gao, G.R.: Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences. In: The 12th International European Conference on Parallel Processing, 29 August - 1 September (2006)
Venetis, I.E., Gao, G.R.: Optimizing the LU Benchmark for the Cyclops-64 Architecture. CAPSL Technical Memo 75 (February 2007)
Tan, G.: Locality and Parallelism of Algorithm in Irregular Computation. PH.D. dissertation. Institute of Computing Technology, Chinese Academy of Sciences (6) (2007)
Automatically Tuned Linear Algebra Software (ATLAS), http://math-atlas.sourceforge.net/
Yotov, K., Roeder, T., Pingali, K., Gunnels, J., Gustavson, F.: An Experimental Comparison of Cache-oblivious and Cache-aware Programs. In: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, June 9-11 (2007)
Bilardi, G., Pietracaprina, A., Pucci, G., Schifano, S.F., Tripiccione, R.: The Potential of On-Chip Multiprocessing for QCD Machines. In: Proceedings of the International Conference on High Performance Computing, pp. 386–397 (December 2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Long, G., Fan, D., Zhang, J., Song, F., Yuan, N., Lin, W. (2008). A Performance Model of Dense Matrix Operations on Many-Core Architectures. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)