Abstract
In this paper, the Numba, JAX, CuPy, PyTorch, and TensorFlow Python GPU accelerated libraries were benchmarked using scientific numerical kernels on a NVIDIA V100 GPU. The benchmarks consisted of a simple Monte Carlo estimation, a particle interaction kernel, a stencil evolution of an array, and tensor operations. The benchmarking procedure included general memory consumption measurements, a statistical analysis of scalability with problem size to determine the best libraries for the benchmarks, and a productivity measurement using source lines of code (SLOC) as a metric. It was statistically determined that the Numba library outperforms the rest on the Monte Carlo, particle interaction, and stencil benchmarks. The deep learning libraries show better performance on tensor operations. The SLOC count was similar for all the libraries except Numba which presented a higher SLOC count which implies more time is needed for code development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The source code of these benchmarks may be found in this repository: https://gitlab.com/CNCA_CeNAT/sc_parallelpython.
References
The top programming languages. https://octoverse.github.com/2022/top-programming-languages. Accessed 23 June 2022
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016). https://arxiv.org/abs/1603.04467
Alnaasan, N., Jain, A., Shafi, A., Subramoni, H., Panda, D.K.: OMB-PY: Python micro-benchmarks for evaluating performance of MPI libraries on HPC systems (2021). https://arxiv.org/abs/2110.10659
Asanović, K., Bodik, R., Catanzaro, et al.: The landscape of parallel computing research: a view from berkeley. Technical Reports, UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Dogaru, R., Dogaru, I.: A Python framework for fast modelling and simulation of cellular nonlinear networks and other finite-difference time-domain systems (2021)
Facility, A.L.C.: Aurora. https://www.alcf.anl.gov/aurora
Frostig, R., Johnson, M.J., Leary, C.: Compiling machine learning programs via high-level tracing (2016). https://arxiv.org/abs/1603.04467
Holm, H.H., Brodtkorb, A.R., Sætra, M.L.: GPU computing with Python: Performance, energy efficiency and usability. Computation 8 (2020). https://doi.org/10.3390/computation8010004
Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009). https://doi.org/10.1109/IPDPS.2009.5160980
Lam, S.K., Pitrou, A., Seibert, S.: Numba: a LLVM-based python JIT compiler, vol. 2015-January. Association for Computing Machinery (2015). https://doi.org/10.1145/2833157.2833162
Marowka, A.: Python accelerators for high-performance computing. J. Supercomput. 74, 1449–1460 (2018). https://doi.org/10.1007/s11227-017-2213-5
Montgomery, D.C.: Design and analysis of experiments (2017)
NVIDIA: Nvidia HPC application performance. https://developer.nvidia.com/hpc-application-performance
Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: Cupy: a numpy-compatible library for NVIDIA GPU calculations. https://github.com/cupy/cupy
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library (2019). https://arxiv.org/abs/1912.01703
Pata, J., Dutta, I., Lu, N., Vlimant, J.R., et al.: ETH library data analysis with GPU-accelerated kernels (2020). https://doi.org/10.3929/ethz-b-000484721
Springer, P.L.: Berkeley’s dwarfs on CUDA (2012)
Vetter, J.S., Brightwell, R., Gokhale, M., McCormick, P., et al.: Extreme heterogeneity 2018 - productive computational science in the era of extreme heterogeneity: report for doe ASCR workshop on extreme heterogeneity (2018). https://doi.org/10.2172/1473756, https://www.osti.gov/servlets/purl/1473756/
Yang, C.: Hierarchical roofline analysis: how to collect data using performance tools on Intel CPUs and NVIDIA GPUs (2020)
Ziogas, A.N., Ben-Nun, T., Schneider, T., Hoefler, T.: NPBench: a benchmarking suite for high-performance numpy, pp. 63–74. Association for Computing Machinery (2021). https://doi.org/10.1145/3447818.3460360
Acknowledgments
This research was partially supported by a machine allocation on Kabré supercomputer at the Costa Rica National High Technology Center.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Villalobos, J., Meneses, E. (2024). Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python. In: Barrios H., C.J., Rizzi, S., Meneses, E., Mocskos, E., Monsalve Diaz, J.M., Montoya, J. (eds) High Performance Computing. CARLA 2023. Communications in Computer and Information Science, vol 1887. Springer, Cham. https://doi.org/10.1007/978-3-031-52186-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-52186-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-52185-0
Online ISBN: 978-3-031-52186-7
eBook Packages: Computer ScienceComputer Science (R0)