Skip to main content
Log in

Evolving the HPL benchmark towards multi-GPGPU clusters

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

HPL (High Performance Linpack) is a widely accepted benchmark for evaluating high-performance computer clusters. It produces performance results by solving large linear systems, which serves as the measurement of the Top-500 supercomputer ranking. With the increasingly wider performance gap between CPU and GPGPU, non-computing-intensive workload becomes more time-critical and impedes the sustained HPL performance more severely. Traditionally on multi-GPGPU platform, a one-to-one mapping between processes and devices is enforced in HPL. While it brings simplicity for implementation, the even share of the system resources among the processes in each node leads to lower system utilization in the major time-critical algorithmic steps of HPL. In this paper, we propose a novel device-centric HPL approach for current main-stream multi-GPGPU platforms, where each process can make full use of the resources of a node, including accelerators, CPU sockets, PCI-e buses and memory/network bandwidth etc. As a result, the workload on the CPU-end and the inter-process communication are greatly boosted due to higher system utilization, while the computation on the device-end remains efficient. Experiment shows that in the case of a single workstation with 4 GPGPUs, our approach can achieve more than \(80\%\) of the theoretical peak and nearly \(95\%\) of the dgemm performance, which is significantly higher than the state-of-the-art counterpart on the same platform. In the case of multi-GPGPU clusters, we also largely improve the sustained performance and efficiency as compared to previous works of HPL incorporating multi-GPGPU features. Further, based on both performance analysis and the experimental results, we believe that our approach may serve as a competitive cornerstone for further optimizations on future heterogeneous platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiao Sun.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Q., Ma, W., Sun, J. et al. Evolving the HPL benchmark towards multi-GPGPU clusters. CCF Trans. HPC 5, 84–96 (2023). https://doi.org/10.1007/s42514-022-00128-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-022-00128-6

Keywords

Navigation