Abstract
High performance LINPACK (HPL) benchmark is used to evaluate the maximum floating-point performance of a computer cluster. Since the performance of the graphics processing unit (GPU) has been improved rapidly, many researchers start to optimize HPL benchmark through GPU to maximize system utilization. Nevertheless, it is difficult to determine the optimal combination of parameters in this process due to the complexity of the input. Therefore, running HPL on a heterogeneous system is time-consuming and is not flexible under different hardware components. So we propose a simulation model of HPL in this paper. The model is no longer limited by hardware components and able to simulate the execute process of HPL across different computing node in heterogeneous GPU-enhanced clusters at any scale. It can also assist researchers in evaluating the floating-point performance quickly and provide a reference for the hardware investment.
Similar content being viewed by others
Notes
Single-node HPL simulation model: http://www.i-test.com.cn/hpc/singleNode.html.
Multi-node HPL simulation model: http://www.i-test.com.cn/hpc/multiNode.html.
References
Adalsteinsson H, Cranford S, Evensky DA, Kenny JP, Mayo J, Pinar A, Janssen CL (2010) A simulator for large-scale parallel computer architectures. Int J Distrib Syst Technol 1(2):57–73. https://doi.org/10.4018/jdst.2010040104
AMD (2017) Hpl-rocm. https://github.com/rocmarchive/HPL-ROCm
Ben-Nun T, Sutton M, Pai S, Pingali K (2017) Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, New York, NY, USA, PPoPP ’17, pp 235–248, https://doi.org/10.1145/3018743.3018756,
Chen C, Fang J, Tang T, Yang C (2017) LU factorization on heterogeneous systems: an energy-efficient approach towards high performance. Computing 99(8):791–811. https://doi.org/10.1007/s00607-016-0537-2
Cornebize T, Heinrich FC, Legrand A, Vienne J (2017) Emulating High Performance Linpack on a Commodity Server at the Scale of a Supercomputer, https://hal.inria.fr/hal-01654804, working paper or preprint
Cornebize T, Legrand A, Heinrich FC (2019) Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891011
Davies T, Karlsson C, Liu H, Ding C, Chen Z (2011) High Performance Lipack Benchmark: A Fault Tolerant Implementation Without Checkpointing. In: Proceedings of the International Conference on Supercomputing, Association for Computing Machinery, New York, NY, USA, ICS ’11, p 162–171, https://doi.org/10.1145/1995896.1995923
Degomme A, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F (2017) Simulating MPI applications: the SMPI approach. IEEE Trans Parall Distrib Syst 28(8):2387–2400. https://doi.org/10.1109/TPDS.2017.2669305
Dittmer S, Kluth T, Henriksen MTR, Maass P (2020) Deep image prior for 3d magnetic particle imaging: a quantitative comparison of regularization techniques on open mpi dataset. arXiv:2007.01593
Gan X, Hu Y, Liu J, Chi L, Xu H, Gong C, Li S, Yan Y (2018) Customizing the HPL for China accelerator. Sci China Inf Sci 61(4):42102. https://doi.org/10.1007/s11432-017-9221-0
Haitao Zhao Leisheng Li, Wenhao Yang, Hui Zhao, Huiyuan Li JS (2020) Research on HPL parallelcComputing model for a class of complex heterogeneous supercomputer system. http://www.jfdc.cnic.cn
Hemmatpour M, Montrucchio B, Rebaudengo M (2018) Communicating efficiently on cluster-based remote direct memory access (RDMA) over infiniband protocol. Appl Sci 8(11):2034
Hjelm N, Pritchard H, Gutiérrez SK, Holmes DJ, Castain R, Skjellum A (2019) MPI Sessions: Evaluation of an Implementation in Open MPI. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11, https://doi.org/10.1109/CLUSTER.2019.8891002
Huang J, Lu L (2019) Performance Optimization of High-Performance Linpack Based on GPU-Centric Model on Heterogeneous Systems. In: 2019 IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 1371–1377, https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00197
Jo G, Nah J, Lee J, Kim J, Lee J (2015) Accelerating LINPACK with MPI-OpenCL oncClusters of multi-GPU nodes. IEEE Trans Parallel Distrib Syst 26(7):1814–1825
Kwack J, Bauer GH (2018) HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters–A Cray XE6/XK7 hybrid system. Concurr Comput: Pract Exp 30(1):e4298. https://doi.org/10.1002/cpe.4298
Lin F, Liu Y, Guo Y, Qian D (2020) ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters. J Supercomput. https://doi.org/10.1007/s11227-020-03319-6
Liu J, Xue Y, Ren K, Song J, Windmill C, Merritt P (2019) High-performance time-series quantitative retrieval from satellite images on a GPU cluster. IEEE J Sel Topics App Earth Observ Remote Sens 12(8):2810–2821. https://doi.org/10.1109/JSTARS.2019.2920077
Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Human-centric Comput Inf Sci 8(1):1. https://doi.org/10.1186/s13673-017-0124-3
McCalpin JD (2018) HPL and DGEMM Performance Variability on the Xeon Platinum 8160 Processor. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 225–237, https://doi.org/10.1109/SC.2018.00021
Mohammadi M, Bazhirov T (2018) Comparative Benchmarking of Cloud Computing Vendors with High Performance Linpack. In: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications, Association for Computing Machinery, New York, NY, USA, HP3C, pp 1–5, https://doi.org/10.1145/3195612.3195613
Mubarak M, Carothers CD, Ross RB, Carns P (2017) Enabling parallel simulation of large-scale HPC network systems. IEEE Trans Parallel Distrib Syst 28(1):87–100. https://doi.org/10.1109/TPDS.2016.2543725
Rohr D, De Cuveland J, Lindenstruth V (2016) A Model for Weak Scaling to Many GPUs at the Basis of the Linpack Benchmark. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp 192–202
Végh J (2018) Limitations of performance of exascale applications and supercomputers they are running on. arXiv:1808.05338
Yang C, Chen C, Tang T, Chen X, Fang J, Xue J (2016) An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp 971–979, https://doi.org/10.1109/ICPADS.2016.0130
Yang W, Li K, Li K (2017) A hybrid computing method of spmv on cpu-gpu heterogeneous computing systems. J Parallel Distrib Comput 104:49–60
Yong C, Lee GW, Huh EN (2018) Proposal of container-based HPC structures and performance analysis. J Inf Process Syst 14(6):1398–1404
Zhang Wenli and Fan Jianping CM (2004) Emulation and Forecast of HPL Test Performance. http://crad.ict.ac.cn
Zheng G, Kakulapati G, Kale LV (2004) BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., p 78, https://doi.org/10.1109/IPDPS.2004.1303013
Acknowledgment
This work was supported by National Natural Science Foundation of China (CN) and Guangzhou Produce & Research Fund under grand no. 201902020004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, Y., Lu, L. Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems. J Supercomput 77, 13739–13756 (2021). https://doi.org/10.1007/s11227-021-03829-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03829-x