COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

Portero, Antoni; Falquez, Carlos; Ho, Nam; Petrakis, Polydoros; Nassyr, Stepan; Marazakis, Manolis; Dolbeau, Romain; Cifuentes, Jorge Alejandro Nocua; Alvarez, Luis Bertran; Pleiter, Dirk; Suarez, Estela

doi:10.1007/978-3-031-42785-5_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13949))

Included in the following conference series:

International Conference on Architecture of Computing Systems

355 Accesses

Abstract

This paper explores the memory subsystem design through gem5 simulations of a non-uniform memory access (NUMA) architecture with ARM cores equipped with vector engines. And connected to a Network-on-Chip (NoC) following the Coherent Hub Interface (CHI) protocol. The study quantifies the benefits of vectorization, prefetching, and multichannel NoC configurations using a benchmark for generating memory patterns and indexed accesses. The outcomes provide insights into improving bus utilization and bandwidth and reducing stalls in the system. The paper proposes hardware/software (HW/SW) advancements to reach and use the HBM device with a higher percentage than 80% at the memory controllers in the simulated manycore system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sato, M., et al.: Co-design and system for the supercomputer “fugaku’’. IEEE Micro. 42(2), 26–34 (2022)
Article Google Scholar
Monroe, D.: Fugaku takes the lead. Commun. ACM 64(1), 16–18 (2021)
Article Google Scholar
Yamamura, S., et al.: A64FX: 52-core processor designed for the 442petaflops supercomputer fugaku. In: ISSCC, San Francisco, CA, USA, 20–26 February 2022, pp. 352–354. IEEE (2022)
Google Scholar
Sato, M.: The supercomputer “fugaku” and ARM-SVE enabled A64FX processor for energy-efficiency and sustained application performance. In: ISPDC 2020, pp. 1–5 (2020)
Google Scholar
Stephens, N., et al.: The ARM scalable vector extension. CoRR, abs/1803.06185 (2018)
Google Scholar
Lee, J., et al.: Extending OpenMP SIMD support for target specific code and application to ARM SVE. In: Scaling OpenMP for Exascale Performance and Portability - 13th IWOMP (2017)
Google Scholar
Reed, D., et al.: Reinventing high performance computing: Challenges and opportunities (2022)
Google Scholar
Petitet, A., et al.: HPL - a portable implementation of the high-performance LINPACK benchmark for distributed-memory computers, December 2018
Google Scholar
Wu, D., Li, J., Yin, R., Hsiao, H., Kim, Y., Miguel, J.S.: UGEMM: unary computing architecture for GEMM applications. In: ISCA, pp. 377–390 (2020)
Google Scholar
Zaourar, L., et al.: Multilevel simulation-based co-design of next generation HPC microprocessors (PMBS), St. Louis, MO, USA, pp. 18–29 (2021)
Google Scholar
Lavin, P., Riedy, E.J., Vuduc, R., Young, J.S.: Spatter: a benchmark suite for evaluating sparse access patterns. CoRR, abs/1811.03743 (2018)
Google Scholar
Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference For High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2020)
Google Scholar
Mathá, R., Kimovski, D., Zabrovskiy, A., Timmerer, C., Prodan, R.: Where to encode: a performance analysis of \(\times \)86 and ARM-based Amazon EC2 instances. In: eScience, pp. 118–127 (2021)
Google Scholar
ARM: ARM® Neoverse™ V1- Amazon’s graviton3 server chip. https://www.nextplatform.com/2022/05/24/the-value-proposition-for-amazons-graviton3-server-chip/
ECP: Milestone M1 Report: HBM2/3 Evaluation on Many-core CPU WBS 2.4, Milestone ECP-MT-1000. Exascale Computing Project, June 2018
Google Scholar
Biswas, A.: Sapphire Rapids. In: 2021 IEEE Hot Chips 33 Symposium (HCS), Palo Alto, CA, USA, pp. 1–22 (2021). https://doi.org/10.1109/HCS52781.2021.9566865
ARM: Learn the architecture - Introducing AMBA CHI, Non-Confidential. Issue 01, 102407_0100_01_e
Google Scholar
High bandwidth memory (HBM) dram. JEDEC (2020)
Google Scholar
Brank, B., Nassyr, S., Pouyan, F., Pleiter, D.: Porting applications to ARM-based processors. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 559–566 (2020)
Google Scholar
McCalpin, J.: Memory bandwidth and machine balance in current high performance computers. (TCCA) Newsletter 2, 19–25 (1995)
Google Scholar
McKee, S.A.: Reflections on the memory wall. In: Proceedings of the First Conference on Computing Frontiers, 2004, Ischia, Italy, 14–16 April 2004
Google Scholar
Qureshi, Y., et al.: Gem5-X: a many-core heterogeneous simulation platform for architectural exploration and optimization. ACM Trans. Archit. Code Optim. 18, 1–27 (2021)
Article Google Scholar
Okazaki, R., et al.: Supercomputer Fugaku CPU A64FX Realizing High Performance, High-Density Packaging, and Low Power Consumption. Fujitsu Technical ReviewNo.32020 (2020)
Google Scholar
Hondou, M.: A64fx microarchitecture manual v1.8 released (2019). https://github.com/fujitsu/A64FX
Nakamura, Y., et al.: Fugaku codesign report. Technical report, FLAGSHIP 2020 Project, RIKEN Center for Computational Science (R-CCS), RIKEN (2022)
Google Scholar
Smith, A.J.: Sequential program prefetching in memory hierarchies. Computer 11, 7–21 (1978)
Article Google Scholar
Kritikakou, A., Catthoor, F., Goutis, C.: Scalable and Near-Optimal Design Space Exploration for Embedded Systems. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04942-7
Book Google Scholar
ARM: AMBA® 5 CHI architecture specification. https://developer.arm.com/documentation/ihi0050/ea/ (2020)
JEDEC: High bandwidth memory (HBM) dram. Standards JESD235D, Joint Electron Device Engineering Council, March 2021
Google Scholar
ARM: Developer, ARM® neoverse™ v1 core, rev:r1p1. Technical reference manual. Technical report, ARM- Advanced RISC Machines (2021)
Google Scholar
‘/’ Inside amazon’s graviton3 ARM server processor. https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor. Accessed 17 Oct 2022
ARM: ARM® Neoverse™ N1 core - technical reference manual. https://developer.arm.com/documentation/100616/0401/?lang=en (2020)
Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News. 39, 1–7 (2011)
Article Google Scholar
Ventroux, N., et al.: SESAM: An MPSoC simulation environment for dynamic application processing. In: 2010 10th IEEE CIT, pp. 1880–1886 (2010)
Google Scholar
Gómez, C., et al.: Design space exploration of next-generation HPC machines. IPDPS 2019, 54–65 (2019)
Google Scholar
Hardavellas, N., et al.: SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Perform. Eval. Rev. 31, 31–34 (2004)
Article Google Scholar
Magnusson, P.S., et al.: Simics: a full system simulation platform. Computer 35, 50–58 (2002)
Article Google Scholar
Carlson, et al. Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: SC 2011, pp. 1–12 (2011)
Google Scholar
Microarchitecture description ARM v1. ARM report (2022)
Google Scholar
ARM: ARM® Neoverse™ CMN-700 Coherent Mesh Network, Technical Reference Manual, 102308_0300_05_en (2022)
Google Scholar

Download references

Acknowledgment

This work has been performed in the context of the European Processor Initiative (EPI) project, which has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement №101036168 (EPI-SGA2).

Author information

Authors and Affiliations

Jülich Supercomputing Centre, Novel System Architectures Design, Forschungszentrum Jülich GmbH, Jülich, Germany
Antoni Portero, Carlos Falquez, Nam Ho, Stepan Nassyr & Estela Suarez
Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
Polydoros Petrakis & Manolis Marazakis
SiPearl, Rennes, France
Romain Dolbeau
ATOS, Les Clayes-sous-Bois, France
Jorge Alejandro Nocua Cifuentes & Luis Bertran Alvarez
KTH, Royal Institute of Technology, Stockholm, Sweden
Dirk Pleiter

Authors

Antoni Portero
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Falquez
View author publications
You can also search for this author in PubMed Google Scholar
Nam Ho
View author publications
You can also search for this author in PubMed Google Scholar
Polydoros Petrakis
View author publications
You can also search for this author in PubMed Google Scholar
Stepan Nassyr
View author publications
You can also search for this author in PubMed Google Scholar
Manolis Marazakis
View author publications
You can also search for this author in PubMed Google Scholar
Romain Dolbeau
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Alejandro Nocua Cifuentes
View author publications
You can also search for this author in PubMed Google Scholar
Luis Bertran Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Pleiter
View author publications
You can also search for this author in PubMed Google Scholar
Estela Suarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoni Portero .

Editor information

Editors and Affiliations

National Technical University of Athens, Athens, Greece
Georgios Goumas
Kiel University, Kiel, Germany
Sven Tomforde
Gottfried Wilhelm Leibniz Universität Hannover, Hannover, Germany
Jürgen Brehm
Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
Stefan Wildermann
Otto-von-Guericke University Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Portero, A. et al. (2023). COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores. In: Goumas, G., Tomforde, S., Brehm, J., Wildermann, S., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2023. Lecture Notes in Computer Science, vol 13949. Springer, Cham. https://doi.org/10.1007/978-3-031-42785-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-42785-5_8
Published: 26 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42784-8
Online ISBN: 978-3-031-42785-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores