Abstract
With the rapid development of supercomputers, large-scale computing has become increasingly widespread in various scientific research and engineering fields. Meanwhile, the precision and efficiency of large-scale floating-point arithmetic have always been a research hotspot in high-performance computing. This paper studies the numerical method to solve large-scale sparse linear equations, in which the accumulation of rounding errors during the solution process leads to inaccurate results, and large-scale data makes the solver produce a long running time. For the above issues, we use error-free transformation technology and mixed-precision ideas to construct a reliable parallel numerical algorithm framework based on HYPRE, which solves large-scale sparse linear equations to improve accuracy and accelerate numerical calculations. Moreover, we illustrate the implementation details of our technique by implementing two cases. One is that we use error-free transformation technology to design high-precision iterative algorithms, such as GMRES, PCG, and BICGSTAB, which reduce rounding errors in the calculation process and make the result more accurate. The other is that we propose a mixed-precision iterative algorithm that utilizes low-precision formats to achieve higher computing power and reduce computing time. Experimental results demonstrate that XHYPRE has higher reliability and effectiveness . Our XHYPRE is on average 1.3x faster than HYPRE and reduces the number of iterations to 87.1% on average.
Similar content being viewed by others
References
Abdelfattah, A., Anzt, H., Boman, E.G., Carson, E., Cojean, T., Dongarra, J., Fox, A., Gates, M., Higham, N.J., Li, X.S., et al.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021). https://doi.org/10.1177/10943420211003313
Abdulah, S., Cao, Q., Pei, Y., Bosilca, G., Dongarra, J., Genton, M.G., Keyes, D.E., Ltaief, H., Sun, Y.: Accelerating geostatistical modeling and prediction with mixed-precision computations: a high-productivity approach with parsec. IEEE Trans. Parallel Distrib. Syst. 33(4), 964–976 (2022). https://doi.org/10.1109/TPDS.2021.3084071
Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180(12), 2526–2533 (2009). https://doi.org/10.1016/j.cpc.2008.11.005
Bailey, D.H., Barrio, R., Borwein, J.M.: High-precision computation: Mathematical physics and dynamics. Appl. Math. Comput. 218(20), 10106–10121 (2012). https://doi.org/10.1016/j.amc.2012.03.087
Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Scaling hypre’s multigrid solvers to 100,000 cores. High-Perform. Sci. Comput. (2012). https://doi.org/10.1007/978-1-4471-2437-5_13
Benz, F., Hildebrandt, A., Hack, S.: A dynamic program analysis to find floating-point accuracy problems. ACM SIGPLAN Not. 47(6), 453–462 (2012). https://doi.org/10.1145/2345156.2254118
Blanchard, P., Higham, N.J., Lopez, F., Mary, T., Pranesh, S.: Mixed precision block fused multiply-add: error analysis and application to gpu tensor cores. SIAM J. Sci. Comput. 42(3), 124–141 (2020). https://doi.org/10.1137/19M1289546
Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), 817–847 (2018). https://doi.org/10.1137/17M1140819
Connolly, M.P., Higham, N.J., Mary, T.: Stochastic rounding and its probabilistic backward error analysis. SIAM J. Sci. Comput. 43(1), 566–585 (2021). https://doi.org/10.1137/20M1334796
Cools, S., Yetkin, E.F., Agullo, E., Giraud, L., Vanroose, W.: Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined conjugate gradient method. SIAM J. Matrix Anal. Appl. 39(1), 426–450 (2018). https://doi.org/10.1137/17M1117872
de Camargo, A.P.: On the numerical stability of newton’s formula for lagrange interpolation. J. Comput. Appl. Math. 365, 112369 (2020). https://doi.org/10.1016/j.cam.2019.112369
Dekker, T.J.: A floating-point technique for extending the available precision. Numerische Mathematik 18(3), 224–242 (1971). https://doi.org/10.1137/030601818
Delgado Gracia, J.: Compensated evaluation of tensor product surfaces in cagd. Mathematics 8(12), 2219 (2020). https://doi.org/10.3390/math8122219
Du, P., Barrio, R., Jiang, H., Cheng, L.: Accurate quotient-difference algorithm: error analysis, improvements and applications. Appl. Math. Comput. 309, 245–271 (2017). https://doi.org/10.1016/j.amc.2017.04.004
Engwer, C., Falgout, R.D., Yang, U.M.: Stencil computations for pde-based applications with examples from dune and hypre. Concurr. Comput.: Pract. Exp. 29(17), 4097 (2017). https://doi.org/10.1002/cpe.4097
Falgout, R.D., Yang, U.M.: hypre: a library of high performance preconditioners. Int. Conf. Comput. Sci. (2002). https://doi.org/10.1007/3-540-47789-6_66
Falgout, R.D., Jones, J.E., Yang, U.M.: The design and implementation of hypre, a library of parallel high performance preconditioners. Numer. Solut. Partial Diff. Equ. Parallel Comput. (2006). https://doi.org/10.1007/3-540-31619-1_8
Falgout, R.D., Jones, J.E., Yang, U.M.: Conceptual interfaces in hypre. Futur. Gener. Comput. Syst. 22(1–2), 239–251 (2006). https://doi.org/10.1016/j.future.2003.09.006
Gershman, R., Strichman, O.: Cost-effective hyper-resolution for preprocessing cnf formulas. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 423–429 (2005). https://doi.org/10.1007/11499107_34
Graillat, S., Ménissier-Morain, V.: Compensated horner scheme in complex floating point arithmetic. In: Proceedings of the 8th Conference on Real Numbers and Computers, Santiago de Compostela, Spain, pp. 133–146 (2008)
Graillat, S., Jézéquel, F.: Tight interval inclusions with compensated algorithms. IEEE Trans. Comput. 69(12), 1774–1783 (2020). https://doi.org/10.1109/TC.2019.2924005
Graillat, S., Jézéquel, F., Picot, R.: Numerical validation of compensated algorithms with stochastic arithmetic. Appl. Math. Comput. 329, 339–363 (2018). https://doi.org/10.1016/j.amc.2018.02.004
Haidar, A., Bayraktar, H., Tomov, S., Dongarra, J., Higham, N.J.: Mixed-precision iterative refinement using tensor cores on gpus to accelerate solution of linear systems. Proc. R. Soc. A 476(2243), 20200110 (2020). https://doi.org/10.1098/rspa.2020.0110
Hermes, D.: Compensated de casteljau algorithm in k times the working precision. Appl. Math. Comput. 357, 57–74 (2019). https://doi.org/10.1016/j.amc.2019.03.047
Higham, N.J., Mary, T.: A new approach to probabilistic rounding error analysis. SIAM J. Sci. Comput. 41(5), 2815–2835 (2019). https://doi.org/10.1137/18M1226312
https://github.com/solverchallenge/solverchallenge21-tenproblems
Hypre:https://computing.llnl.gov/projects/hypre-scalable-linear-solvers-multigrid-methods
Jiang, H., Graillat, S., Hu, C., Li, S., Liao, X., Cheng, L., Su, F.: Accurate evaluation of the k-th derivative of a polynomial and its application. J. Comput. Appl. Math. 243, 28–47 (2013). https://doi.org/10.1016/j.cam.2012.11.008
Jin, G., Mellor-Crummey, J.: Experiences tuning smg98: a semicoarsening multigrid benchmark based on the hypre library. Proc. 16th Int. Conf. Supercomput. (2002). https://doi.org/10.1145/514191.514233
Knuth, D.E.: Art of Computer Programming, Volume 2: Seminumerical Algorithms, (2014)
Knyazev, A.V., Argentati, M.E., Lashuk, I., Ovtchinnikov, E.E.: Block locally optimal preconditioned eigenvalue xolvers (blopex) in hypre and petsc. SIAM J. Sci. Comput. 29(5), 2224–2239 (2007). https://doi.org/10.1137/060661624
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the cell processor using cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008). https://doi.org/10.1109/TPDS.2007.70813
Lashuk, I., Argentati, M., Ovtchinnikov, E., Knyazev, A.: Preconditioned eigensolver lobpcg in hypre and petsc. Domain Decompos. Methods Sci. Eng. 16, 635–642 (2007). https://doi.org/10.1007/978-3-540-34469-8_79
Li, C., Xiao, X., Du, P., Jiang, H., Barrio, R., Quan, Z., Li, K.: A high-precision dqds algorithm. In: 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pp. 633–639 (2021). IEEE
Li, C., Du, P., Li, K., Liu, Y., Jiang, H., Quan, Z.: Accurate goertzel algorithm: error analysis, validations and applications. Mathematics 10(11), 1788 (2022)
Li, C., Barrio, R., Xiao, X., Du, P., Jiang, H., Quan, Z., Li, K.: Pacf: A precision-adjustable computational framework for solving singular values. Appl. Math. Comput. 440, 127611 (2023). https://doi.org/10.1016/j.amc.2022.127611
Lindquist, N., Luszczek, P., Dongarra, J.: Accelerating restarted gmres with mixed precision arithmetic. IEEE Trans. Parallel Distrib. Syst. 33(4), 1027–1037 (2022). https://doi.org/10.1109/TPDS.2021.3090757
Mascarenhas, W.F., de Camargo, A.P.: The effects of rounding errors in the nodes on barycentric interpolation. Numerische Mathematik 135(1), 113–141 (2017). https://doi.org/10.1007/s00211-016-0798-x
McCormick, S.F., Benzaken, J., Tamstorf, R.: Algebraic error analysis for mixed-precision multigrid solvers. SIAM J. Sci. Comput. 43(5), 392–419 (2021). https://doi.org/10.1137/20M1348571
Menon, H., Lam, M.O., Osei-Kuffuor, D., Schordan, M., Lloyd, S., Mohror, K., Hittinger, J.: Adapt: Algorithmic differentiation applied to floating-point precision tuning. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 614–626 (2018). https://doi.org/10.1109/SC.2018.00051
Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: The Fused Multiply-Add Instruction, pp. 151–179. Birkhäuser Boston, Boston (2010). https://doi.org/10.1007/978-0-8176-4705-6_5
Muller, J.-M., Brisebarre, N., De Dinechin, F., Jeannerod, C.-P., Lefevre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S., et al.: Handbook of floating-point Arithmetic. Birkhauser (2018)
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005). https://doi.org/10.1137/030601818
Ozaki, K., Terao, T., Ogita, T., Katagiri, T.: Verified numerical computations for large-scale linear systems. Appl. Math. 66(2), 269–285 (2021)
Petschow, M., Quintana-Ortí, E.S., Bientinesi, P.: Improved accuracy and parallelism for mrrr-based eigensolvers–a mixed precision approach. SIAM J. Sci. Comput. 36(2), 240–263 (2014). https://doi.org/10.1137/130911561
Sahasrabudhe, D., Berzins, M.: Improving performance of the hypre iterative solver for uintah combustion codes on manycore architectures using mpi endpoints and kernel consolidation. Int. Conf. Comput. Sci. (2020). https://doi.org/10.1007/978-3-030-50371-0_13
Sahasrabudhe, D., Zambre, R., Chandramowlishwaran, A., Berzins, M.: Optimizing the hypre solver for manycore and gpu architectures. J. Comput. Sci. 49, 101279 (2021). https://doi.org/10.1016/j.jocs.2020.101279
Schmidt, J., Berzins, M., Thornock, J., Saad, T., Sutherland, J.: Large scale parallel solution of incompressible flow problems using uintah and hypre. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 458–465 (2013). https://doi.org/10.1109/CCGrid.2013.10
Sorna, A., Cheng, X., D’Azevedo, E., Won, K., Tomov, S.: Optimizing the fast fourier transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp. 3–7 (2018). https://doi.org/10.1109/HiPCW.2018.8634417
Stummel, F.: Rounding error analysis of elementary numerical algorithms. Fundam. Numer. Comput. (computer-oriented numerical analysis) (1980). https://doi.org/10.1007/978-3-7091-8577-3_13
Sun, J., Peterson, G.D., Storaasli, O.O.: High-performance mixed-precision linear solver for fpgas. IEEE Trans. Comput. 57(12), 1614–1623 (2008). https://doi.org/10.1109/TC.2008.89
Tan, G., Shui, C., Wang, Y., Yu, X., Yan, Y.: Optimizing the linpack algorithm for large-scale pcie-based cpu-gpu heterogeneous systems. IEEE Trans. Parallel Distrib. Syst. 32(9), 2367–2380 (2021). https://doi.org/10.1109/TPDS.2021.3067731
Wei, J., Chen, M., Wang, L., Ren, P., Lei, Y., Qu, Y., Jiang, Q., Dong, X., Wu, W., Wang, Q., et al.: Status, challenges and trends of data-intensive supercomputing. CCF Trans. High. Perform. Comput. (2022). https://doi.org/10.1007/s42514-022-00109-9
Yang, W., Li, K., Li, K.: A hybrid computing method of spmv on cpu-gpu heterogeneous computing systems. J. Parallel Distrib. Comput. 104, 49–60 (2017). https://doi.org/10.1016/j.jpdc.2016.12.023
Zhang, L., Gong, X., Song, J., Hu, J.: Parallel preconditioned gmres solvers for 3-d helmholtz equations in regional non-hydrostatic atmosphere model. 2008 Int. Conf. Comput.Sci. Softw. Eng. 3, 287–290 (2008). https://doi.org/10.1109/CSSE.2008.898
Acknowledgements
This work was supported by the NuSCAP (ANR-20-CE48-0014) project of the French National Agency for Research (ANR), the 173 program (2020-JCJQ-ZD-029), Science Challenge Project (TZ2016002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no confict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Graillat, S., Quan, Z. et al. XHYPRE: a reliable parallel numerical algorithm library for solving large-scale sparse linear equations. CCF Trans. HPC 5, 191–209 (2023). https://doi.org/10.1007/s42514-023-00141-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-023-00141-3