Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA

de Doncker, Elise; Kapenga, John; Assaf, Rida

doi:10.1007/978-3-319-06548-9_13

Elise de Doncker²,
John Kapenga² &
Rida Assaf²

3017 Accesses
3 Citations

Abstract

The rapidly evolving CUDA environment is well suited for numerical integration of high dimensional integrals, in particular by Monte Carlo or quasi-Monte Carlo methods. With some care, near peak performance can be obtained on important applications. A basis for efficient numerical integration using using CUDA kernels on NVIDIA GPUs is presented, showing several ways to use CUDA features, provide automatic error control and prevention or detection of roundoff errors. This framework allows easy extension to multiple GPUs, clusters and clouds for addressing problems that were impractical to attack in the past, and is the basis for an update to the ParInt numerical integration package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

CUDA Library. http://www.nvidia.com/getcuda (last accessed May 2014)
Brown, R.: DIEHARDER. http://www.phy.duke.edu/~rgb/General/dieharder.php (last accessed May 2014)
Chan, T.F., Golub, G.H., LeVeque, R.J.: Updating formulae and a pairwise algorithm for computing sample variances. Technical Report STAN-CS-79-773, Stanford University ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf (1979)
Davis, P.J., Rabinowitz, P.: Methods of Numerical Integration. Academic, New York (1975)
MATH Google Scholar
de Doncker, E., Assaf, R.: GPU integral computations in stochastic geometry. In: VII Workshop Computational Geometry and Applications (CGA). Lecture Notes in Computer Science, vol. 7972, pp. 129–139 (2013)
Article Google Scholar
de Doncker, E., Kapenga, J., Liou, W.W.: Open source software for Monte Carlo/DSMC applications. In: 55th AIAA/ASMe/ASCE/AHS/SC Structures, Structural Dynamics, and Materials Conference, The American Institute of Aeronautics and Astronautics (AIAA) (2014). doi:10.2514/6.2014-0348
Google Scholar
de Doncker, E., Yuasa, F.: Distributed and multi-core computation of 2-loop integrals. In: 15th International Workshop on Adv. Computing and Analysis Techniques in Physics (ACAT 2013), Journal of Physics, Conference Series. To appear (2014).
Google Scholar
Dremmel, J., Nguyen, H.D.: Fast reproducible floating-point summations. In: 2013 21st IEEE Symposium on Computer Arithmetic (ARITH), pp. 163–172 (2013)
Google Scholar
Genz, A.: MVNPACK. http://www.math.wsu.edu/faculty/genz/software/fort77/mvnpack.f (2010)
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–48 (1991)
Article Google Scholar
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia, Addison-Wesley (2002). ISBN 978-0-898715-21-7
Book MATH Google Scholar
IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985. Institute of Electrical and Electronics Engineers, New York (1985). Reprinted in SIGPLAN Notices 22(2), 9–25 (1987)
Google Scholar
IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-2008. Institute of Electrical and Electronics Engineers, New York (2008)
Google Scholar
Kapenga, J., de Doncker, E.: Compensated summation on multiple NVIDIA GPUs. HPCS Technical Report HPCS-2014-1, Western Michigan University (2014)
Google Scholar
Knuth, D.E.: The Art of Computer Programming, Volume 2, Seminumerical Algorithms, 3rd edn. Addison-Wesley (1998)
Google Scholar
L’ Equyer, P.: Combined multiple recursive random number generators. Oper. Res. 44, 816–822 (1996)
Google Scholar
Laporta, S.: High-precision calculation of multi-loop Feynman integrals by difference equations. Int. J. Mod. Phys. A 15, 5087–5159 (2000). arXiv:hep-ph/0102033v1
Google Scholar
L’Equyer, P., Simard, R.: A C library for empirical testing of random number generators. ACM Trans. Math. Softw. 33, 22 (2007)
Article Google Scholar
Manssen, M., Weigel, M., Hartmann, A.K.: Random number generators for massively parallel simulations on GPU (2012). arXiv:1204.6193v1 [physics.comp-ph] 27 April 2012
Google Scholar
Marsaglia, G.: DIEHARD: a battery of tests of randomness. http://www.stat.fsu.edu/pub/diehard
Marsaglia, G.: Xorshift RNGs. J. Stat. Softw. 8, 1–6 (2003)
Google Scholar
Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. Model. Comput. Simul. 8, 3 (2003)
Article Google Scholar
Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefevre, V., Melquiond, G., Revol, N., Stehle, D., Torres, S. Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010). ACM G.1.0; G.1.2; G.4; B.2.0; B.2.4; F.2.1., ISBN 978-0-8176-4704-9
Google Scholar
NVIDIA. Tesla Product Literature. http://www.nvidia.com/object/tesla_product_literature.html (last accessed May 2014)
NVIDIA. http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf (last accessed May 2014)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part i: Faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Article MATH MathSciNet Google Scholar
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part ii: Sign, k-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)
Article MATH MathSciNet Google Scholar
Saito, M., Matsumoto, M.: Variants of Mersenne twister suitable for graphics processors. Trans. Math. Softw. 39(12), 1–20 (2013)
Article MathSciNet Google Scholar
Salmon, J.K., Moraes, M.A.: Random123: a library of counter-based random number generators. http://deshawresearch.com/resources_random123.html, and Random123-1.06 Documentation, http://www.thesalmons.org/john/random123/releases/1.06/docs (last accessed May 2014)
Salmon, J.K., Moraes, M.A., Dror, R.O., Shaw, D.E.: Parallel random numbers: as easy as 1, 2, 3. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC11) (2011)
Google Scholar
Sanders, J., Kandrot, E.: CUDA by Example - An Introduction to General-Purpose GPU Programming. Addison-Wesley, Reading (2011). ISBN: 978-0-13-138768-3
Google Scholar
SPRNG: The scalable parallel random number generators library. http://www.sprng.org (last accessed May 2014)
Whitehead, N., Fit-Floreas, A.: Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. http://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf Nvidia developers (2011)

Download references

Acknowledgements

We acknowledge the support from the National Science Foundation under Award Number 1126438, and from NVIDIA for the award of our CUDA Teaching Center.

Author information

Authors and Affiliations

Western Michigan University, 1903 W. Michigan Avenue, Kalamazoo, MI, 49008, USA
Elise de Doncker, John Kapenga & Rida Assaf

Authors

Elise de Doncker
View author publications
You can also search for this author in PubMed Google Scholar
John Kapenga
View author publications
You can also search for this author in PubMed Google Scholar
Rida Assaf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elise de Doncker .

Editor information

Editors and Affiliations

National Center for Supercomputing Applications, University of Illinois, Urbana, Illinois, USA
Volodymyr Kindratenko

Appendix

As an application we treated a Feynman (two-)loop integral, which was previously considered in [7, 17]. This type of problem is important for the calculation of the cross section of particle interactions in high energy physics.

Table 13.2 lists the integral approximation, absolute error and error estimate, and the parallel time of a GPU computation on Kepler K20, using the MC kernel with Random123 as the PRNG, and Kahan summation on the GPU. The results in the bottom part of the table (for N ≥ 4 × 10⁸) are obtained using the horizontal dynamic parallelism strategy with a chunk size of 200 million. For the smaller values of N in the top part of the table, the chunk size is set equal to N, so no child kernels are launched. For these values of N, the execution time is compared to that of a corresponding sequential calculation where erand48() is called to generate the pseudo-random sequence on the CPU. Speedups of near full peak performance are observed. Note that the error decreases to 9.9e−08 at N = 4 × 10¹⁰. The integrand function is given below.

Table 13.2 Times and speedup results for a Feynman loop integral

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de Doncker, E., Kapenga, J., Assaf, R. (2014). Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA. In: Kindratenko, V. (eds) Numerical Computations with GPUs. Springer, Cham. https://doi.org/10.1007/978-3-319-06548-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-06548-9_13
Published: 09 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06547-2
Online ISBN: 978-3-319-06548-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation