Skip to main content
Log in

Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

High performance LINPACK (HPL) benchmark is used to evaluate the maximum floating-point performance of a computer cluster. Since the performance of the graphics processing unit (GPU) has been improved rapidly, many researchers start to optimize HPL benchmark through GPU to maximize system utilization. Nevertheless, it is difficult to determine the optimal combination of parameters in this process due to the complexity of the input. Therefore, running HPL on a heterogeneous system is time-consuming and is not flexible under different hardware components. So we propose a simulation model of HPL in this paper. The model is no longer limited by hardware components and able to simulate the execute process of HPL across different computing node in heterogeneous GPU-enhanced clusters at any scale. It can also assist researchers in evaluating the floating-point performance quickly and provide a reference for the hardware investment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Single-node HPL simulation model: http://www.i-test.com.cn/hpc/singleNode.html.

    Multi-node HPL simulation model: http://www.i-test.com.cn/hpc/multiNode.html.

References

  1. Adalsteinsson H, Cranford S, Evensky DA, Kenny JP, Mayo J, Pinar A, Janssen CL (2010) A simulator for large-scale parallel computer architectures. Int J Distrib Syst Technol 1(2):57–73. https://doi.org/10.4018/jdst.2010040104

    Article  Google Scholar 

  2. AMD (2017) Hpl-rocm. https://github.com/rocmarchive/HPL-ROCm

  3. Ben-Nun T, Sutton M, Pai S, Pingali K (2017) Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, New York, NY, USA, PPoPP ’17, pp 235–248, https://doi.org/10.1145/3018743.3018756,

  4. Chen C, Fang J, Tang T, Yang C (2017) LU factorization on heterogeneous systems: an energy-efficient approach towards high performance. Computing 99(8):791–811. https://doi.org/10.1007/s00607-016-0537-2

    Article  MathSciNet  MATH  Google Scholar 

  5. Cornebize T, Heinrich FC, Legrand A, Vienne J (2017) Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer, https://hal.inria.fr/hal-01654804, working paper or preprint

  6. Cornebize T, Legrand A, Heinrich FC (2019) Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891011

  7. Davies T, Karlsson C, Liu H, Ding C, Chen Z (2011) High Performance Lipack Benchmark: A Fault Tolerant Implementation Without Checkpointing. In: Proceedings of the International Conference on Supercomputing, Association for Computing Machinery, New York, NY, USA, ICS ’11, p 162–171, https://doi.org/10.1145/1995896.1995923

  8. Degomme A, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F (2017) Simulating MPI applications: the SMPI approach. IEEE Trans Parall Distrib Syst 28(8):2387–2400. https://doi.org/10.1109/TPDS.2017.2669305

    Article  Google Scholar 

  9. Dittmer S, Kluth T, Henriksen MTR, Maass P (2020) Deep image prior for 3d magnetic particle imaging: a quantitative comparison of regularization techniques on open mpi dataset. arXiv:2007.01593

  10. Gan X, Hu Y, Liu J, Chi L, Xu H, Gong C, Li S, Yan Y (2018) Customizing the HPL for China accelerator. Sci China Inf Sci 61(4):42102. https://doi.org/10.1007/s11432-017-9221-0

    Article  Google Scholar 

  11. Haitao Zhao Leisheng Li, Wenhao Yang, Hui Zhao, Huiyuan Li JS (2020) Research on HPL parallelcComputing model for a class of complex heterogeneous supercomputer system. http://www.jfdc.cnic.cn

  12. Hemmatpour M, Montrucchio B, Rebaudengo M (2018) Communicating efficiently on cluster-based remote direct memory access (RDMA) over infiniband protocol. Appl Sci 8(11):2034

    Article  Google Scholar 

  13. Hjelm N, Pritchard H, Gutiérrez SK, Holmes DJ, Castain R, Skjellum A (2019) MPI Sessions: Evaluation of an Implementation in Open MPI. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891002

  14. Huang J, Lu L (2019) Performance Optimization of High-Performance Linpack Based on GPU-Centric Model on Heterogeneous Systems. In: 2019 IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 1371–1377, https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00197

  15. Jo G, Nah J, Lee J, Kim J, Lee J (2015) Accelerating LINPACK with MPI-OpenCL oncClusters of multi-GPU nodes. IEEE Trans Parallel Distrib Syst 26(7):1814–1825

    Article  Google Scholar 

  16. Kwack J, Bauer GH (2018) HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters–A Cray XE6/XK7 hybrid system. Concurr Comput: Pract Exp 30(1):e4298. https://doi.org/10.1002/cpe.4298

    Article  Google Scholar 

  17. Lin F, Liu Y, Guo Y, Qian D (2020) ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters. J Supercomput. https://doi.org/10.1007/s11227-020-03319-6

    Article  Google Scholar 

  18. Liu J, Xue Y, Ren K, Song J, Windmill C, Merritt P (2019) High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE J Sel Topics App Earth Observ Remote Sens 12(8):2810–2821. https://doi.org/10.1109/JSTARS.2019.2920077

    Article  Google Scholar 

  19. Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Human-centric Comput Inf Sci 8(1):1. https://doi.org/10.1186/s13673-017-0124-3

    Article  Google Scholar 

  20. McCalpin JD (2018) HPL and DGEMM Performance Variability on the Xeon Platinum 8160 Processor. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 225–237, https://doi.org/10.1109/SC.2018.00021

  21. Mohammadi M, Bazhirov T (2018) Comparative Benchmarking of Cloud Computing Vendors with High Performance Linpack. In: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications, Association for Computing Machinery, New York, NY, USA, HP3C, pp 1–5, https://doi.org/10.1145/3195612.3195613

  22. Mubarak M, Carothers CD, Ross RB, Carns P (2017) Enabling parallel simulation of large-scale HPC network systems. IEEE Trans Parallel Distrib Syst 28(1):87–100. https://doi.org/10.1109/TPDS.2016.2543725

    Article  Google Scholar 

  23. Rohr D, De Cuveland J, Lindenstruth V (2016) A Model for Weak Scaling to Many GPUs at the Basis of the Linpack Benchmark. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp 192–202

  24. Végh J (2018) Limitations of performance of exascale applications and supercomputers they are running on. arXiv:1808.05338

  25. Yang C, Chen C, Tang T, Chen X, Fang J, Xue J (2016) An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp 971–979, https://doi.org/10.1109/ICPADS.2016.0130

  26. Yang W, Li K, Li K (2017) A hybrid computing method of spmv on cpu-gpu heterogeneous computing systems. J Parallel Distrib Comput 104:49–60

    Article  Google Scholar 

  27. Yong C, Lee GW, Huh EN (2018) Proposal of container-based HPC structures and performance analysis. J Inf Process Syst 14(6):1398–1404

    Google Scholar 

  28. Zhang Wenli and Fan Jianping CM (2004) Emulation and Forecast of HPL Test Performance. http://crad.ict.ac.cn

  29. Zheng G, Kakulapati G, Kale LV (2004) BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., p 78, https://doi.org/10.1109/IPDPS.2004.1303013

Download references

Acknowledgment

This work was supported by National Natural Science Foundation of China (CN) and Guangzhou Produce & Research Fund under grand no. 201902020004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Y., Lu, L. Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems. J Supercomput 77, 13739–13756 (2021). https://doi.org/10.1007/s11227-021-03829-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03829-x

Keywords

Navigation