Abstract
Many large problems need linear algebra operations with a precision exceeding the standard floating-point binary64 format. In this paper, we implement a multiple-precision scaled vector addition BLAS routine (WAXPBY) on graphics processing units. We use a residue number system (RNS) to represent significands of floating-point values. In RNS, large numbers replace with their residues and the operations of addition, subtraction and multiplication perform on these residues in parallel and without carry propagation. Our parallel WAXPBY algorithm is divided into a number of steps, and each step is carried out by a separate GPU kernel. Experiments show that the developed routine clearly outperforms parallel CPU-based multiple-precision implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bailey, D.H., Hida, Y., Li, X.S., Thompson, B.: ARPREC: an arbitrary precision computation package. Technical report, Lawrence Berkeley National Laboratory (2002). https://www.osti.gov/servlets/purl/817634. Accessed 28 Jan 2019
Bailey, D., Borwein, J.: High-precision arithmetic in mathematical physics. Mathematics 3(2), 337–367 (2015). https://doi.org/10.3390/math3020337
Blackford, L.S., et al.: An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002). https://doi.org/10.1145/567806.567807
Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2), article no. 13 (2007). https://doi.org/10.1145/1236463.1236468
Isupov, K., Knyazkov, V.: Interval estimation of relative values in residue number system. J. Circ. Syst. Comput. 27(1), 1850004 (2018). https://doi.org/10.1142/S0218126618500044
Isupov, K., Knyazkov, V., Kuvaev, A.: Fast power-of-two RNS scaling algorithm for large dynamic ranges. In: IVth International Conference on Engineering and Telecommunication (EnT), pp. 135–139. IEEE, Moscow (2017). https://doi.org/10.1109/ICEnT.2017.36
Johnson-McDaniel, N.K., Shah, A.G., Whiting, B.F.: Experimental mathematics meets gravitational self-force. Phys. Rev. D 92(4), 044007 (2015). https://doi.org/10.1103/PhysRevD.92.044007
Joldes, M., Muller, J.-M., Popescu, V., Tucker, W.: CAMPARY: cuda multiple precision arithmetic library and applications. In: Greuel, G.-M., Koch, T., Paule, P., Sommese, A. (eds.) ICMS 2016. LNCS, vol. 9725, pp. 232–240. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42432-3_29
Li, X.S., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002). https://doi.org/10.1145/567806.567808
Lu, M., He, B., Luo, Q.: Supporting extended precision on graphics processors. In: Sixth International Workshop on Data Management on New Hardware (DaMoN 2010), pp. 19–26. ACM, Indianapolis (2010). https://doi.org/10.1145/1869389.1869392
Mukunoki, D., Takahashi, D.: Implementation and evaluation of quadruple precision BLAS functions on GPUs. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7133, pp. 249–259. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28151-8_25
Nakata, M.: Poster: Mpack 0.7.0: Multiple precision version of BLAS and LAPACK. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1353–1353. IEEE, Salt Lake City (2012). https://doi.org/10.1109/SC.Companion.2012.183
Nakayama, T.: The CUDA multiple precision arithmetic library. https://github.com/skystar0227/CUMP. Accessed 30 Apr 2019
Omondi, A., Premkumar, B.: Residue Number Systems: Theory and Implementation. Imperial College Press, London (2007)
Simmons-Duffin, D.: A semidefinite program solver for the conformal bootstrap. J. High Energy Phys. 2015(6), 174 (2015). https://doi.org/10.1007/JHEP06(2015)174
Sobyanin, P.: GPU multiple-precision arithmetic libraries (in Russian). Intellektual’nyye sistemy. Teoriya i prilozheniya 22(3), 89–95 (2018). http://intsysjournal.org/pdfs/22-3/Sobyanin.pdf. Accessed 13 May 2019
Acknowledgement
This work was supported by the Russian Science Foundation (grant number 18-71-00063).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Isupov, K., Kuvaev, A. (2019). Multiple-Precision Scaled Vector Addition on Graphics Processing Unit. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-25636-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25635-7
Online ISBN: 978-3-030-25636-4
eBook Packages: Computer ScienceComputer Science (R0)