Skip to main content

Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA

  • Chapter
  • First Online:
Numerical Computations with GPUs

Abstract

The rapidly evolving CUDA environment is well suited for numerical integration of high dimensional integrals, in particular by Monte Carlo or quasi-Monte Carlo methods. With some care, near peak performance can be obtained on important applications. A basis for efficient numerical integration using using CUDA kernels on NVIDIA GPUs is presented, showing several ways to use CUDA features, provide automatic error control and prevention or detection of roundoff errors. This framework allows easy extension to multiple GPUs, clusters and clouds for addressing problems that were impractical to attack in the past, and is the basis for an update to the ParInt numerical integration package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. CUDA Library. http://www.nvidia.com/getcuda (last accessed May 2014)

  2. Brown, R.: DIEHARDER. http://www.phy.duke.edu/~rgb/General/dieharder.php (last accessed May 2014)

  3. Chan, T.F., Golub, G.H., LeVeque, R.J.: Updating formulae and a pairwise algorithm for computing sample variances. Technical Report STAN-CS-79-773, Stanford University ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf (1979)

  4. Davis, P.J., Rabinowitz, P.: Methods of Numerical Integration. Academic, New York (1975)

    MATH  Google Scholar 

  5. de Doncker, E., Assaf, R.: GPU integral computations in stochastic geometry. In: VII Workshop Computational Geometry and Applications (CGA). Lecture Notes in Computer Science, vol. 7972, pp. 129–139 (2013)

    Article  Google Scholar 

  6. de Doncker, E., Kapenga, J., Liou, W.W.: Open source software for Monte Carlo/DSMC applications. In: 55th AIAA/ASMe/ASCE/AHS/SC Structures, Structural Dynamics, and Materials Conference, The American Institute of Aeronautics and Astronautics (AIAA) (2014). doi:10.2514/6.2014-0348

    Google Scholar 

  7. de Doncker, E., Yuasa, F.: Distributed and multi-core computation of 2-loop integrals. In: 15th International Workshop on Adv. Computing and Analysis Techniques in Physics (ACAT 2013), Journal of Physics, Conference Series. To appear (2014).

    Google Scholar 

  8. Dremmel, J., Nguyen, H.D.: Fast reproducible floating-point summations. In: 2013 21st IEEE Symposium on Computer Arithmetic (ARITH), pp. 163–172 (2013)

    Google Scholar 

  9. Genz, A.: MVNPACK. http://www.math.wsu.edu/faculty/genz/software/fort77/mvnpack.f (2010)

  10. Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–48 (1991)

    Article  Google Scholar 

  11. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia, Addison-Wesley (2002). ISBN 978-0-898715-21-7

    Book  MATH  Google Scholar 

  12. IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985. Institute of Electrical and Electronics Engineers, New York (1985). Reprinted in SIGPLAN Notices 22(2), 9–25 (1987)

    Google Scholar 

  13. IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-2008. Institute of Electrical and Electronics Engineers, New York (2008)

    Google Scholar 

  14. Kapenga, J., de Doncker, E.: Compensated summation on multiple NVIDIA GPUs. HPCS Technical Report HPCS-2014-1, Western Michigan University (2014)

    Google Scholar 

  15. Knuth, D.E.: The Art of Computer Programming, Volume 2, Seminumerical Algorithms, 3rd edn. Addison-Wesley (1998)

    Google Scholar 

  16. L’ Equyer, P.: Combined multiple recursive random number generators. Oper. Res. 44, 816–822 (1996)

    Google Scholar 

  17. Laporta, S.: High-precision calculation of multi-loop Feynman integrals by difference equations. Int. J. Mod. Phys. A 15, 5087–5159 (2000). arXiv:hep-ph/0102033v1

    Google Scholar 

  18. L’Equyer, P., Simard, R.: A C library for empirical testing of random number generators. ACM Trans. Math. Softw. 33, 22 (2007)

    Article  Google Scholar 

  19. Manssen, M., Weigel, M., Hartmann, A.K.: Random number generators for massively parallel simulations on GPU (2012). arXiv:1204.6193v1 [physics.comp-ph] 27 April 2012

    Google Scholar 

  20. Marsaglia, G.: DIEHARD: a battery of tests of randomness. http://www.stat.fsu.edu/pub/diehard

  21. Marsaglia, G.: Xorshift RNGs. J. Stat. Softw. 8, 1–6 (2003)

    Google Scholar 

  22. Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. Model. Comput. Simul. 8, 3 (2003)

    Article  Google Scholar 

  23. Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefevre, V., Melquiond, G., Revol, N., Stehle, D., Torres, S. Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010). ACM G.1.0; G.1.2; G.4; B.2.0; B.2.4; F.2.1., ISBN 978-0-8176-4704-9

    Google Scholar 

  24. NVIDIA. Tesla Product Literature. http://www.nvidia.com/object/tesla_product_literature.html (last accessed May 2014)

  25. NVIDIA. http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf (last accessed May 2014)

  26. Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  27. Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part ii: Sign, k-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  28. Saito, M., Matsumoto, M.: Variants of Mersenne twister suitable for graphics processors. Trans. Math. Softw. 39(12), 1–20 (2013)

    Article  MathSciNet  Google Scholar 

  29. Salmon, J.K., Moraes, M.A.: Random123: a library of counter-based random number generators. http://deshawresearch.com/resources_random123.html, and Random123-1.06 Documentation, http://www.thesalmons.org/john/random123/releases/1.06/docs (last accessed May 2014)

  30. Salmon, J.K., Moraes, M.A., Dror, R.O., Shaw, D.E.: Parallel random numbers: as easy as 1, 2, 3. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC11) (2011)

    Google Scholar 

  31. Sanders, J., Kandrot, E.: CUDA by Example - An Introduction to General-Purpose GPU Programming. Addison-Wesley, Reading (2011). ISBN: 978-0-13-138768-3

    Google Scholar 

  32. SPRNG: The scalable parallel random number generators library. http://www.sprng.org (last accessed May 2014)

  33. Whitehead, N., Fit-Floreas, A.: Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. http://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf Nvidia developers (2011)

Download references

Acknowledgements

We acknowledge the support from the National Science Foundation under Award Number 1126438, and from NVIDIA for the award of our CUDA Teaching Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elise de Doncker .

Editor information

Editors and Affiliations

Appendix

Appendix

As an application we treated a Feynman (two-)loop integral, which was previously considered in [7, 17]. This type of problem is important for the calculation of the cross section of particle interactions in high energy physics.

Table 13.2 lists the integral approximation, absolute error and error estimate, and the parallel time of a GPU computation on Kepler K20, using the MC kernel with Random123 as the PRNG, and Kahan summation on the GPU. The results in the bottom part of the table (for N ≥ 4 × 108) are obtained using the horizontal dynamic parallelism strategy with a chunk size of 200 million. For the smaller values of N in the top part of the table, the chunk size is set equal to N, so no child kernels are launched. For these values of N, the execution time is compared to that of a corresponding sequential calculation where erand48() is called to generate the pseudo-random sequence on the CPU. Speedups of near full peak performance are observed. Note that the error decreases to 9.9e−08 at N = 4 × 1010. The integrand function is given below.

Table 13.2 Times and speedup results for a Feynman loop integral

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

de Doncker, E., Kapenga, J., Assaf, R. (2014). Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA. In: Kindratenko, V. (eds) Numerical Computations with GPUs. Springer, Cham. https://doi.org/10.1007/978-3-319-06548-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06548-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06547-2

  • Online ISBN: 978-3-319-06548-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics