Skip to main content

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

  • Conference paper
  • First Online:
Architecture of Computing Systems (ARCS 2023)

Abstract

This paper explores the memory subsystem design through gem5 simulations of a non-uniform memory access (NUMA) architecture with ARM cores equipped with vector engines. And connected to a Network-on-Chip (NoC) following the Coherent Hub Interface (CHI) protocol. The study quantifies the benefits of vectorization, prefetching, and multichannel NoC configurations using a benchmark for generating memory patterns and indexed accesses. The outcomes provide insights into improving bus utilization and bandwidth and reducing stalls in the system. The paper proposes hardware/software (HW/SW) advancements to reach and use the HBM device with a higher percentage than 80% at the memory controllers in the simulated manycore system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sato, M., et al.: Co-design and system for the supercomputer “fugaku’’. IEEE Micro. 42(2), 26–34 (2022)

    Article  Google Scholar 

  2. Monroe, D.: Fugaku takes the lead. Commun. ACM 64(1), 16–18 (2021)

    Article  Google Scholar 

  3. Yamamura, S., et al.: A64FX: 52-core processor designed for the 442petaflops supercomputer fugaku. In: ISSCC, San Francisco, CA, USA, 20–26 February 2022, pp. 352–354. IEEE (2022)

    Google Scholar 

  4. Sato, M.: The supercomputer “fugaku” and ARM-SVE enabled A64FX processor for energy-efficiency and sustained application performance. In: ISPDC 2020, pp. 1–5 (2020)

    Google Scholar 

  5. Stephens, N., et al.: The ARM scalable vector extension. CoRR, abs/1803.06185 (2018)

    Google Scholar 

  6. Lee, J., et al.: Extending OpenMP SIMD support for target specific code and application to ARM SVE. In: Scaling OpenMP for Exascale Performance and Portability - 13th IWOMP (2017)

    Google Scholar 

  7. Reed, D., et al.: Reinventing high performance computing: Challenges and opportunities (2022)

    Google Scholar 

  8. Petitet, A., et al.: HPL - a portable implementation of the high-performance LINPACK benchmark for distributed-memory computers, December 2018

    Google Scholar 

  9. Wu, D., Li, J., Yin, R., Hsiao, H., Kim, Y., Miguel, J.S.: UGEMM: unary computing architecture for GEMM applications. In: ISCA, pp. 377–390 (2020)

    Google Scholar 

  10. Zaourar, L., et al.: Multilevel simulation-based co-design of next generation HPC microprocessors (PMBS), St. Louis, MO, USA, pp. 18–29 (2021)

    Google Scholar 

  11. Lavin, P., Riedy, E.J., Vuduc, R., Young, J.S.: Spatter: a benchmark suite for evaluating sparse access patterns. CoRR, abs/1811.03743 (2018)

    Google Scholar 

  12. Sato, M., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference For High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2020)

    Google Scholar 

  13. Mathá, R., Kimovski, D., Zabrovskiy, A., Timmerer, C., Prodan, R.: Where to encode: a performance analysis of \(\times \)86 and ARM-based Amazon EC2 instances. In: eScience, pp. 118–127 (2021)

    Google Scholar 

  14. ARM: ARM® Neoverse™ V1- Amazon’s graviton3 server chip. https://www.nextplatform.com/2022/05/24/the-value-proposition-for-amazons-graviton3-server-chip/

  15. ECP: Milestone M1 Report: HBM2/3 Evaluation on Many-core CPU WBS 2.4, Milestone ECP-MT-1000. Exascale Computing Project, June 2018

    Google Scholar 

  16. Biswas, A.: Sapphire Rapids. In: 2021 IEEE Hot Chips 33 Symposium (HCS), Palo Alto, CA, USA, pp. 1–22 (2021). https://doi.org/10.1109/HCS52781.2021.9566865

  17. ARM: Learn the architecture - Introducing AMBA CHI, Non-Confidential. Issue 01, 102407_0100_01_e

    Google Scholar 

  18. High bandwidth memory (HBM) dram. JEDEC (2020)

    Google Scholar 

  19. Brank, B., Nassyr, S., Pouyan, F., Pleiter, D.: Porting applications to ARM-based processors. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 559–566 (2020)

    Google Scholar 

  20. McCalpin, J.: Memory bandwidth and machine balance in current high performance computers. (TCCA) Newsletter 2, 19–25 (1995)

    Google Scholar 

  21. McKee, S.A.: Reflections on the memory wall. In: Proceedings of the First Conference on Computing Frontiers, 2004, Ischia, Italy, 14–16 April 2004

    Google Scholar 

  22. Qureshi, Y., et al.: Gem5-X: a many-core heterogeneous simulation platform for architectural exploration and optimization. ACM Trans. Archit. Code Optim. 18, 1–27 (2021)

    Article  Google Scholar 

  23. Okazaki, R., et al.: Supercomputer Fugaku CPU A64FX Realizing High Performance, High-Density Packaging, and Low Power Consumption. Fujitsu Technical ReviewNo.32020 (2020)

    Google Scholar 

  24. Hondou, M.: A64fx microarchitecture manual v1.8 released (2019). https://github.com/fujitsu/A64FX

  25. Nakamura, Y., et al.: Fugaku codesign report. Technical report, FLAGSHIP 2020 Project, RIKEN Center for Computational Science (R-CCS), RIKEN (2022)

    Google Scholar 

  26. Smith, A.J.: Sequential program prefetching in memory hierarchies. Computer 11, 7–21 (1978)

    Article  Google Scholar 

  27. Kritikakou, A., Catthoor, F., Goutis, C.: Scalable and Near-Optimal Design Space Exploration for Embedded Systems. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04942-7

    Book  Google Scholar 

  28. ARM: AMBA® 5 CHI architecture specification. https://developer.arm.com/documentation/ihi0050/ea/ (2020)

  29. JEDEC: High bandwidth memory (HBM) dram. Standards JESD235D, Joint Electron Device Engineering Council, March 2021

    Google Scholar 

  30. ARM: Developer, ARM® neoverse™ v1 core, rev:r1p1. Technical reference manual. Technical report, ARM- Advanced RISC Machines (2021)

    Google Scholar 

  31. ‘/’ Inside amazon’s graviton3 ARM server processor. https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor. Accessed 17 Oct 2022

  32. ARM: ARM® Neoverse™ N1 core - technical reference manual. https://developer.arm.com/documentation/100616/0401/?lang=en (2020)

  33. Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News. 39, 1–7 (2011)

    Article  Google Scholar 

  34. Ventroux, N., et al.: SESAM: An MPSoC simulation environment for dynamic application processing. In: 2010 10th IEEE CIT, pp. 1880–1886 (2010)

    Google Scholar 

  35. Gómez, C., et al.: Design space exploration of next-generation HPC machines. IPDPS 2019, 54–65 (2019)

    Google Scholar 

  36. Hardavellas, N., et al.: SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Perform. Eval. Rev. 31, 31–34 (2004)

    Article  Google Scholar 

  37. Magnusson, P.S., et al.: Simics: a full system simulation platform. Computer 35, 50–58 (2002)

    Article  Google Scholar 

  38. Carlson, et al. Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: SC 2011, pp. 1–12 (2011)

    Google Scholar 

  39. Microarchitecture description ARM v1. ARM report (2022)

    Google Scholar 

  40. ARM: ARM® Neoverse™ CMN-700 Coherent Mesh Network, Technical Reference Manual, 102308_0300_05_en (2022)

    Google Scholar 

Download references

Acknowledgment

This work has been performed in the context of the European Processor Initiative (EPI) project, which has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement №101036168 (EPI-SGA2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoni Portero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Portero, A. et al. (2023). COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores. In: Goumas, G., Tomforde, S., Brehm, J., Wildermann, S., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2023. Lecture Notes in Computer Science, vol 13949. Springer, Cham. https://doi.org/10.1007/978-3-031-42785-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42785-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42784-8

  • Online ISBN: 978-3-031-42785-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics