Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python

Villalobos, Johansell; Meneses, Esteban

doi:10.1007/978-3-031-52186-7_1

Johansell Villalobos¹¹ &
Esteban Meneses^11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1887))

Included in the following conference series:

Latin American High Performance Computing Conference

76 Accesses

Abstract

In this paper, the Numba, JAX, CuPy, PyTorch, and TensorFlow Python GPU accelerated libraries were benchmarked using scientific numerical kernels on a NVIDIA V100 GPU. The benchmarks consisted of a simple Monte Carlo estimation, a particle interaction kernel, a stencil evolution of an array, and tensor operations. The benchmarking procedure included general memory consumption measurements, a statistical analysis of scalability with problem size to determine the best libraries for the benchmarks, and a productivity measurement using source lines of code (SLOC) as a metric. It was statistically determined that the Numba library outperforms the rest on the Monte Carlo, particle interaction, and stencil benchmarks. The deep learning libraries show better performance on tensor operations. The SLOC count was similar for all the libraries except Numba which presented a higher SLOC count which implies more time is needed for code development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The source code of these benchmarks may be found in this repository: https://gitlab.com/CNCA_CeNAT/sc_parallelpython.

References

The top programming languages. https://octoverse.github.com/2022/top-programming-languages. Accessed 23 June 2022
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016). https://arxiv.org/abs/1603.04467
Alnaasan, N., Jain, A., Shafi, A., Subramoni, H., Panda, D.K.: OMB-PY: Python micro-benchmarks for evaluating performance of MPI libraries on HPC systems (2021). https://arxiv.org/abs/2110.10659
Asanović, K., Bodik, R., Catanzaro, et al.: The landscape of parallel computing research: a view from berkeley. Technical Reports, UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Dogaru, R., Dogaru, I.: A Python framework for fast modelling and simulation of cellular nonlinear networks and other finite-difference time-domain systems (2021)
Google Scholar
Facility, A.L.C.: Aurora. https://www.alcf.anl.gov/aurora
Frostig, R., Johnson, M.J., Leary, C.: Compiling machine learning programs via high-level tracing (2016). https://arxiv.org/abs/1603.04467
Holm, H.H., Brodtkorb, A.R., Sætra, M.L.: GPU computing with Python: Performance, energy efficiency and usability. Computation 8 (2020). https://doi.org/10.3390/computation8010004
Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009). https://doi.org/10.1109/IPDPS.2009.5160980
Lam, S.K., Pitrou, A., Seibert, S.: Numba: a LLVM-based python JIT compiler, vol. 2015-January. Association for Computing Machinery (2015). https://doi.org/10.1145/2833157.2833162
Marowka, A.: Python accelerators for high-performance computing. J. Supercomput. 74, 1449–1460 (2018). https://doi.org/10.1007/s11227-017-2213-5
Montgomery, D.C.: Design and analysis of experiments (2017)
Google Scholar
NVIDIA: Nvidia HPC application performance. https://developer.nvidia.com/hpc-application-performance
Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: Cupy: a numpy-compatible library for NVIDIA GPU calculations. https://github.com/cupy/cupy
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library (2019). https://arxiv.org/abs/1912.01703
Pata, J., Dutta, I., Lu, N., Vlimant, J.R., et al.: ETH library data analysis with GPU-accelerated kernels (2020). https://doi.org/10.3929/ethz-b-000484721
Springer, P.L.: Berkeley’s dwarfs on CUDA (2012)
Google Scholar
Vetter, J.S., Brightwell, R., Gokhale, M., McCormick, P., et al.: Extreme heterogeneity 2018 - productive computational science in the era of extreme heterogeneity: report for doe ASCR workshop on extreme heterogeneity (2018). https://doi.org/10.2172/1473756, https://www.osti.gov/servlets/purl/1473756/
Yang, C.: Hierarchical roofline analysis: how to collect data using performance tools on Intel CPUs and NVIDIA GPUs (2020)
Google Scholar
Ziogas, A.N., Ben-Nun, T., Schneider, T., Hoefler, T.: NPBench: a benchmarking suite for high-performance numpy, pp. 63–74. Association for Computing Machinery (2021). https://doi.org/10.1145/3447818.3460360

Download references

Acknowledgments

This research was partially supported by a machine allocation on Kabré supercomputer at the Costa Rica National High Technology Center.

Author information

Authors and Affiliations

Advanced Computing Laboratory, National High Technology Center, San Jose, Costa Rica
Johansell Villalobos & Esteban Meneses
School of Computing, Costa Rica Institute of Technology, Cartago, Costa Rica
Esteban Meneses

Authors

Johansell Villalobos
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Meneses
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johansell Villalobos .

Editor information

Editors and Affiliations

Industrial University of Santander, Bucaramanga, Colombia
Carlos J. Barrios H.
Argonne National Laboratory, Lemont, IL, USA
Silvio Rizzi
Centro Nacional de Alta Tecnología, San José, Costa Rica
Esteban Meneses
University of Buenos Aires & Center for Computational Simulation Aplicaciones Tecnológicas, Buenos Aires, Argentina
Esteban Mocskos
Argonne National Laboratory, Lemont, IL, USA
Jose M. Monsalve Diaz
University of Cartagena, Cartagena, Colombia
Javier Montoya

A Roofline Graphs for the Algorithms Proposed

The algorithms proposed in this work were profiled with the NVIDIA Nsight Compute command line interface (CLI) using the methodology found in [19]. Roofline graphs are shown in Figs. 4 and 5.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villalobos, J., Meneses, E. (2024). Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python. In: Barrios H., C.J., Rizzi, S., Meneses, E., Mocskos, E., Monsalve Diaz, J.M., Montoya, J. (eds) High Performance Computing. CARLA 2023. Communications in Computer and Information Science, vol 1887. Springer, Cham. https://doi.org/10.1007/978-3-031-52186-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-52186-7_1
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-52185-0
Online ISBN: 978-3-031-52186-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Roofline Graphs for the Algorithms Proposed

A Roofline Graphs for the Algorithms Proposed

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation