Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems

Hu, Yichang; Lu, Lu

doi:10.1007/s11227-021-03829-x

Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems

Published: 03 May 2021

Volume 77, pages 13739–13756, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yichang Hu¹ &
Lu Lu¹

430 Accesses
Explore all metrics

Abstract

High performance LINPACK (HPL) benchmark is used to evaluate the maximum floating-point performance of a computer cluster. Since the performance of the graphics processing unit (GPU) has been improved rapidly, many researchers start to optimize HPL benchmark through GPU to maximize system utilization. Nevertheless, it is difficult to determine the optimal combination of parameters in this process due to the complexity of the input. Therefore, running HPL on a heterogeneous system is time-consuming and is not flexible under different hardware components. So we propose a simulation model of HPL in this paper. The model is no longer limited by hardware components and able to simulate the execute process of HPL across different computing node in heterogeneous GPU-enhanced clusters at any scale. It can also assist researchers in evaluating the floating-point performance quickly and provide a reference for the hardware investment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evolving the HPL benchmark towards multi-GPGPU clusters

Article 26 October 2022

Qiao Sun, Wenjing Ma, … Huiyuan Li

High-Performance Simulations on GPUs Using Adaptive Time Steps

Hybrid GPU/CPU Approach to Multiphysics Simulation

Notes

Single-node HPL simulation model: http://www.i-test.com.cn/hpc/singleNode.html.
Multi-node HPL simulation model: http://www.i-test.com.cn/hpc/multiNode.html.

References

Adalsteinsson H, Cranford S, Evensky DA, Kenny JP, Mayo J, Pinar A, Janssen CL (2010) A simulator for large-scale parallel computer architectures. Int J Distrib Syst Technol 1(2):57–73. https://doi.org/10.4018/jdst.2010040104
Article Google Scholar
AMD (2017) Hpl-rocm. https://github.com/rocmarchive/HPL-ROCm
Ben-Nun T, Sutton M, Pai S, Pingali K (2017) Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, New York, NY, USA, PPoPP ’17, pp 235–248, https://doi.org/10.1145/3018743.3018756,
Chen C, Fang J, Tang T, Yang C (2017) LU factorization on heterogeneous systems: an energy-efficient approach towards high performance. Computing 99(8):791–811. https://doi.org/10.1007/s00607-016-0537-2
Article MathSciNet MATH Google Scholar
Cornebize T, Heinrich FC, Legrand A, Vienne J (2017) Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer, https://hal.inria.fr/hal-01654804, working paper or preprint
Cornebize T, Legrand A, Heinrich FC (2019) Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891011
Davies T, Karlsson C, Liu H, Ding C, Chen Z (2011) High Performance Lipack Benchmark: A Fault Tolerant Implementation Without Checkpointing. In: Proceedings of the International Conference on Supercomputing, Association for Computing Machinery, New York, NY, USA, ICS ’11, p 162–171, https://doi.org/10.1145/1995896.1995923
Degomme A, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F (2017) Simulating MPI applications: the SMPI approach. IEEE Trans Parall Distrib Syst 28(8):2387–2400. https://doi.org/10.1109/TPDS.2017.2669305
Article Google Scholar
Dittmer S, Kluth T, Henriksen MTR, Maass P (2020) Deep image prior for 3d magnetic particle imaging: a quantitative comparison of regularization techniques on open mpi dataset. arXiv:2007.01593
Gan X, Hu Y, Liu J, Chi L, Xu H, Gong C, Li S, Yan Y (2018) Customizing the HPL for China accelerator. Sci China Inf Sci 61(4):42102. https://doi.org/10.1007/s11432-017-9221-0
Article Google Scholar
Haitao Zhao Leisheng Li, Wenhao Yang, Hui Zhao, Huiyuan Li JS (2020) Research on HPL parallelcComputing model for a class of complex heterogeneous supercomputer system. http://www.jfdc.cnic.cn
Hemmatpour M, Montrucchio B, Rebaudengo M (2018) Communicating efficiently on cluster-based remote direct memory access (RDMA) over infiniband protocol. Appl Sci 8(11):2034
Article Google Scholar
Hjelm N, Pritchard H, Gutiérrez SK, Holmes DJ, Castain R, Skjellum A (2019) MPI Sessions: Evaluation of an Implementation in Open MPI. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891002
Huang J, Lu L (2019) Performance Optimization of High-Performance Linpack Based on GPU-Centric Model on Heterogeneous Systems. In: 2019 IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 1371–1377, https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00197
Jo G, Nah J, Lee J, Kim J, Lee J (2015) Accelerating LINPACK with MPI-OpenCL oncClusters of multi-GPU nodes. IEEE Trans Parallel Distrib Syst 26(7):1814–1825
Article Google Scholar
Kwack J, Bauer GH (2018) HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters–A Cray XE6/XK7 hybrid system. Concurr Comput: Pract Exp 30(1):e4298. https://doi.org/10.1002/cpe.4298
Article Google Scholar
Lin F, Liu Y, Guo Y, Qian D (2020) ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters. J Supercomput. https://doi.org/10.1007/s11227-020-03319-6
Article Google Scholar
Liu J, Xue Y, Ren K, Song J, Windmill C, Merritt P (2019) High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE J Sel Topics App Earth Observ Remote Sens 12(8):2810–2821. https://doi.org/10.1109/JSTARS.2019.2920077
Article Google Scholar
Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Human-centric Comput Inf Sci 8(1):1. https://doi.org/10.1186/s13673-017-0124-3
Article Google Scholar
McCalpin JD (2018) HPL and DGEMM Performance Variability on the Xeon Platinum 8160 Processor. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 225–237, https://doi.org/10.1109/SC.2018.00021
Mohammadi M, Bazhirov T (2018) Comparative Benchmarking of Cloud Computing Vendors with High Performance Linpack. In: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications, Association for Computing Machinery, New York, NY, USA, HP3C, pp 1–5, https://doi.org/10.1145/3195612.3195613
Mubarak M, Carothers CD, Ross RB, Carns P (2017) Enabling parallel simulation of large-scale HPC network systems. IEEE Trans Parallel Distrib Syst 28(1):87–100. https://doi.org/10.1109/TPDS.2016.2543725
Article Google Scholar
Rohr D, De Cuveland J, Lindenstruth V (2016) A Model for Weak Scaling to Many GPUs at the Basis of the Linpack Benchmark. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp 192–202
Végh J (2018) Limitations of performance of exascale applications and supercomputers they are running on. arXiv:1808.05338
Yang C, Chen C, Tang T, Chen X, Fang J, Xue J (2016) An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp 971–979, https://doi.org/10.1109/ICPADS.2016.0130
Yang W, Li K, Li K (2017) A hybrid computing method of spmv on cpu-gpu heterogeneous computing systems. J Parallel Distrib Comput 104:49–60
Article Google Scholar
Yong C, Lee GW, Huh EN (2018) Proposal of container-based HPC structures and performance analysis. J Inf Process Syst 14(6):1398–1404
Google Scholar
Zhang Wenli and Fan Jianping CM (2004) Emulation and Forecast of HPL Test Performance. http://crad.ict.ac.cn
Zheng G, Kakulapati G, Kale LV (2004) BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., p 78, https://doi.org/10.1109/IPDPS.2004.1303013

Download references

Acknowledgment

This work was supported by National Natural Science Foundation of China (CN) and Guangzhou Produce & Research Fund under grand no. 201902020004.

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guanzhou, 510000, China
Yichang Hu & Lu Lu

Authors

Yichang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lu Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Y., Lu, L. Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems. J Supercomput 77, 13739–13756 (2021). https://doi.org/10.1007/s11227-021-03829-x

Download citation

Accepted: 18 April 2021
Published: 03 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11227-021-03829-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems

Abstract

Access this article

Similar content being viewed by others

Evolving the HPL benchmark towards multi-GPGPU clusters

High-Performance Simulations on GPUs Using Adaptive Time Steps

Hybrid GPU/CPU Approach to Multiphysics Simulation

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems

Abstract

Access this article

Similar content being viewed by others

Evolving the HPL benchmark towards multi-GPGPU clusters

High-Performance Simulations on GPUs Using Adaptive Time Steps

Hybrid GPU/CPU Approach to Multiphysics Simulation

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation