Skip to main content

SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13289))

Included in the following conference series:

  • 1197 Accesses

Abstract

SU3_Bench explores performance portability across multiple programming models using a simple but nontrivial mathematical kernel. This kernel has been derived from the (LQCD) code used in applications such as Hadron Physics and hence should be of interest to the scientific community.

SU3_Bench has a regular compute and data access pattern and on most traditional CPU and GPU-based systems, its performance is mainly determined by the achievable memory bandwidth. However, this paper shows that on the new Intel Programmable Integrated Unified Memory Architecture (PIUMA) that is designed for sparse workloads and has a balanced flops-to-byte ratio with scalar cores, SU3_Bench’s performance is determined by the total number of instructions that can be executed per cycle (pipeline throughput) rather than the usual bandwidth or flops. We show the performance analysis, porting, and optimizations of SU3_Bench on the PIUMA architecture and discuss how they are different from the standard NUMA CPUs (e.g., Xeon required NUMA optimizations whereas, on PIUMA, it was not necessary). We show iso-bandwidth and iso-power comparisons of SU3_Bench for PIUMA vs Xeon. We also show performance efficiency comparisons of SU3_Bench on PIUMA, Xeon, GPUs, and FPGAs based on pre-existing data. The lessons learned are generalizable to other similar kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We are in the power-on phase of a PIUMA system and we plan to update and integrate the simulated results with actual experimental data.

References

  1. NUMA Balancing in RedHat. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-auto_numa_balancing

  2. SU3_Bench. https://gitlab.com/NERSC/nersc-proxies/su3_bench

  3. Aananthakrishnan, S., et al.: PIUMA: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277 (2020)

  4. Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Trans. Archit. Code Optim. 11(3), 1–25 (2014). https://doi.org/10.1145/2629677

    Article  Google Scholar 

  5. David, S.: DARPA ERI: HIVE and Intel PUMA Graph Processor. WikiChip Fuse (2019). https://fuse.wikichip.org/news/2611/darpa-eri-hive-and-intel-puma-graph-processor/

  6. Davis, J.H., Daley, C., Pophale, S., Huber, T., Chandrasekaran, S., Wright, N.J.: Performance assessment of OpenMP compilers targeting NVIDIA V100 GPUs. In: Bhalachandra, S., Wienke, S., Chandrasekaran, S., Juckeland, G. (eds.) WACCPD 2020. LNCS, vol. 12655, pp. 25–44. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74224-9_2

    Chapter  Google Scholar 

  7. Deakin, T.: BableStream Benchmark (2017). http://uob-hpc.github.io/BabelStream/

  8. Doerfler, D., Daley, C., Applencourt, T.: SU3_Bench, a micro-benchmark for exploring exascale era programming models, compilers and runtimes. In: 2020 Performance, Portability, and Productivity in HPC Forum (2020)

    Google Scholar 

  9. Doerfler, D., et al.: Experiences porting the SU3_bench microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs. In: International Workshop on OpenCL, pp. 1–9 (2021)

    Google Scholar 

  10. Jeffers, J., Reinders, J., Sodani, A.: Quantum chromodynamics. In: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)

    Google Scholar 

  11. Lameter, C.: NUMA (non-uniform memory access): an overview. ACM Queue 11(7) (2013). https://dl.acm.org/ft_gateway.cfm?id=2513149&ftid=1388705&dwn=1

  12. McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/

  13. McCreary, D.: Intel’s Incredible PIUMA Graph Analytics Hardware. Medium (2020). https://dmccreary.medium.com/intels-incredible-piuma-graph-analytics-hardware-a2e9c3daf8d8

  14. MIMD Lattice Collaboration, Bernard, C., et al.: The MILC Code (2010)

    Google Scholar 

  15. Tithi, J.J., Petrini, F.: A new parallel algorithm for sinkhorn word-movers distance and its performance on PIUMA and Xeon CPU. CoRR abs/2107.06433 (2021). https://arxiv.org/abs/2107.06433

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesmin Jahan Tithi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tithi, J.J., Checconi, F., Doerfler, D., Petrini, F. (2022). SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs. In: Varbanescu, AL., Bhatele, A., Luszczek, P., Marc, B. (eds) High Performance Computing. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13289. Springer, Cham. https://doi.org/10.1007/978-3-031-07312-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07312-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07311-3

  • Online ISBN: 978-3-031-07312-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics