Abstract
This chapter presents the implementation of a batched CUDA solver based on LU factorization for small linear systems. This solver may be used in applications such as reactive flow transport models, which apply the Newton–Raphson technique to linearize and iteratively solve the sets of non linear equations that represent the reactions for ten of thousands to millions of physical locations. The implementation exploits somewhat counterintuitive GPGPU programming techniques: it assigns the solution of a matrix (representing a system) to a single CUDA thread, does not exploit shared memory and employs dynamic memory allocation on the GPUs. These techniques enable our implementation to simultaneously solve sets of systems with over 100 equations and to employ LU decomposition with complete pivoting, providing the higher numerical accuracy required by certain applications. Other currently available solutions for batched linear solvers are limited by size and only support partial pivoting, although they may result faster in certain conditions. We discuss the code of our implementation and present a comparison with the other implementations, discussing the various tradeoffs in terms of performance and flexibility. This work will enable developers that need batched linear solvers to choose whichever implementation is more appropriate to the features and the requirements of their applications, and even to implement dynamic switching approaches that can choose the best implementation depending on the input data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tang, G., D’Azevedo, E.F., Zhang, F., Parker, J.C., Watson, D.B., Jardine, P.M.: Application of a hybrid MPI/OPENMP approach for parallel groundwater model calibration using multi-core computers. Comput. Geosci. 36, 1451–1460 (2010)
Higham, N.J.: Gaussian elimination. Comput. Stat. 3, 230–238 (2011)
White, M.D., Oostrom, M.: STOMP Subsurface Transport Over Multiple Phase: User’s Guide. Technical report, Pacific Northwest National Laboratory, Richland (2006). PNNL-15782
Yeh, G.T., Tripathi, V.S., Gwo, J.P., Cheng, H.P., Chend, J.-R.C., Salvage, K.M., Li, M.H., Fang, Y., Li, Y., Sun, J.T., Zhang, F., Siegel, M.D.: HYDROGEOCHEM: a coupled model of variably saturated flow, thermal transport, and reactive biogeochemical transport, on laptops to leadership-class supercomputers. In: Groundwater Reactive Transport Models. Bentham Science Publishers, Sharjah (2012)
Hammond, G.E., Lichtner, P.C., Lu, C., Mills, R.T.: Pflotran: reactive flow and transport code for use on laptops to leadership-class supercomputers. In: Groundwater Reactive Transport Models. Bentham Science Publishers, Sharjah (2012)
Zhang, K., Wu, Y., Pruess, K.: User’s Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code. Technical report, Lawrence Berkeley National Laboratory, Berkeley (2008). LBNL-315E
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with gpu accelerators. In: IPDPSW’10: IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 1–8 (2010)
Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., Tomov, S.: Lu factorization for accelerator-based systems. In: AICCSA: 9th IEEE/ACS International Conference on Computer Systems and Applications, pp. 217–224 (2011)
NVIDIA Corporation. Nvidia CUDA C Programming Guide, Version 5.0 (2012)
Song, F., Tomov, S., Dongarra, J.: Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems. In: ICS ’12: The 26th ACM International Conference on Supercomputing, pp. 365–376 (2012)
NVIDIA Corporation. Nidia CUBLAS Library, Version 5.0 (2012)
NVIDIA custom batched LU Decomposition. NVIDIA. Available at http://developer.nvidia.com (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Tumeo, A., Gawande, N., Villa, O. (2014). A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems. In: Kindratenko, V. (eds) Numerical Computations with GPUs. Springer, Cham. https://doi.org/10.1007/978-3-319-06548-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-06548-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06547-2
Online ISBN: 978-3-319-06548-9
eBook Packages: Computer ScienceComputer Science (R0)