Skip to main content

Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1887))

Included in the following conference series:

  • 76 Accesses

Abstract

In this paper, the Numba, JAX, CuPy, PyTorch, and TensorFlow Python GPU accelerated libraries were benchmarked using scientific numerical kernels on a NVIDIA V100 GPU. The benchmarks consisted of a simple Monte Carlo estimation, a particle interaction kernel, a stencil evolution of an array, and tensor operations. The benchmarking procedure included general memory consumption measurements, a statistical analysis of scalability with problem size to determine the best libraries for the benchmarks, and a productivity measurement using source lines of code (SLOC) as a metric. It was statistically determined that the Numba library outperforms the rest on the Monte Carlo, particle interaction, and stencil benchmarks. The deep learning libraries show better performance on tensor operations. The SLOC count was similar for all the libraries except Numba which presented a higher SLOC count which implies more time is needed for code development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The source code of these benchmarks may be found in this repository: https://gitlab.com/CNCA_CeNAT/sc_parallelpython.

References

  1. The top programming languages. https://octoverse.github.com/2022/top-programming-languages. Accessed 23 June 2022

  2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016). https://arxiv.org/abs/1603.04467

  3. Alnaasan, N., Jain, A., Shafi, A., Subramoni, H., Panda, D.K.: OMB-PY: Python micro-benchmarks for evaluating performance of MPI libraries on HPC systems (2021). https://arxiv.org/abs/2110.10659

  4. Asanović, K., Bodik, R., Catanzaro, et al.: The landscape of parallel computing research: a view from berkeley. Technical Reports, UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

  5. Dogaru, R., Dogaru, I.: A Python framework for fast modelling and simulation of cellular nonlinear networks and other finite-difference time-domain systems (2021)

    Google Scholar 

  6. Facility, A.L.C.: Aurora. https://www.alcf.anl.gov/aurora

  7. Frostig, R., Johnson, M.J., Leary, C.: Compiling machine learning programs via high-level tracing (2016). https://arxiv.org/abs/1603.04467

  8. Holm, H.H., Brodtkorb, A.R., Sætra, M.L.: GPU computing with Python: Performance, energy efficiency and usability. Computation 8 (2020). https://doi.org/10.3390/computation8010004

  9. Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009). https://doi.org/10.1109/IPDPS.2009.5160980

  10. Lam, S.K., Pitrou, A., Seibert, S.: Numba: a LLVM-based python JIT compiler, vol. 2015-January. Association for Computing Machinery (2015). https://doi.org/10.1145/2833157.2833162

  11. Marowka, A.: Python accelerators for high-performance computing. J. Supercomput. 74, 1449–1460 (2018). https://doi.org/10.1007/s11227-017-2213-5

  12. Montgomery, D.C.: Design and analysis of experiments (2017)

    Google Scholar 

  13. NVIDIA: Nvidia HPC application performance. https://developer.nvidia.com/hpc-application-performance

  14. Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: Cupy: a numpy-compatible library for NVIDIA GPU calculations. https://github.com/cupy/cupy

  15. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library (2019). https://arxiv.org/abs/1912.01703

  16. Pata, J., Dutta, I., Lu, N., Vlimant, J.R., et al.: ETH library data analysis with GPU-accelerated kernels (2020). https://doi.org/10.3929/ethz-b-000484721

  17. Springer, P.L.: Berkeley’s dwarfs on CUDA (2012)

    Google Scholar 

  18. Vetter, J.S., Brightwell, R., Gokhale, M., McCormick, P., et al.: Extreme heterogeneity 2018 - productive computational science in the era of extreme heterogeneity: report for doe ASCR workshop on extreme heterogeneity (2018). https://doi.org/10.2172/1473756, https://www.osti.gov/servlets/purl/1473756/

  19. Yang, C.: Hierarchical roofline analysis: how to collect data using performance tools on Intel CPUs and NVIDIA GPUs (2020)

    Google Scholar 

  20. Ziogas, A.N., Ben-Nun, T., Schneider, T., Hoefler, T.: NPBench: a benchmarking suite for high-performance numpy, pp. 63–74. Association for Computing Machinery (2021). https://doi.org/10.1145/3447818.3460360

Download references

Acknowledgments

This research was partially supported by a machine allocation on Kabré supercomputer at the Costa Rica National High Technology Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johansell Villalobos .

Editor information

Editors and Affiliations

A Roofline Graphs for the Algorithms Proposed

A Roofline Graphs for the Algorithms Proposed

The algorithms proposed in this work were profiled with the NVIDIA Nsight Compute command line interface (CLI) using the methodology found in [19]. Roofline graphs are shown in Figs. 4 and 5.

Fig. 4.
figure 4

Monte Carlo (a), particles (b), and stencil (c) algorithms roofline graphs.

Fig. 5.
figure 5

Matrix multiplication (a), outer product (b), and matrix contraction (c) algorithms roofline graphs.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Villalobos, J., Meneses, E. (2024). Evaluation of Alternatives to Accelerate Scientific Numerical Calculations on Graphics Processing Units Using Python. In: Barrios H., C.J., Rizzi, S., Meneses, E., Mocskos, E., Monsalve Diaz, J.M., Montoya, J. (eds) High Performance Computing. CARLA 2023. Communications in Computer and Information Science, vol 1887. Springer, Cham. https://doi.org/10.1007/978-3-031-52186-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-52186-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-52185-0

  • Online ISBN: 978-3-031-52186-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics