Abstract
This study examines the mixed-precision iterative refinement technique using posit numbers instead of standard IEEE floating-point. The process is applied to a general linear system \(Ax = b\) where A is a large sparse matrix. Multiple scaling strategies, including row and column equilibration, scale matrix entries into higher-density regions of machine numbers before performing the \(O(n^3)\) factorization operation. Low-precision LU factorization followed by forward/backward substitution yields an initial estimate. The residual \(r = b - Ax\) is computed to a higher precision with a deferred rounding mechanism, then used as the right-hand side in a new linear system \(Ac = r\). The corrector c is calculated and used to refine the previous solution. Results show a 16-bit posit configuration coupled with equilibration yields accuracy comparable to IEEE half-precision (fp16), showing potential for balancing efficiency and accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
There is always at least one regime bit, and for \(n > 2\), there are at least two bits. For \(n>2\), \(1 \le r \le n-1\), therefore \( -(n-1) \le k \le n-2\).
- 2.
Formerly known as the University of Florida Sparse Matrix Collection.
References
Al-Kurdi, A., Kincaid, D.R.: LU-decomposition with iterative refinement for solving sparse linear systems. J. Comput. Appl. Math. 185(2), 391–403 (2006)
Anderson, E., et al.: LAPACK users’ guide. SIAM (1999)
Baboulin, M., et al.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180(12), 2526–2533 (2009)
Bailey, D.H., Borwein, J.M.: High-precision arithmetic in mathematical physics. Mathematics 3(2), 337–367 (2015)
Barrett, R., et al.: Templates for the solution of linear systems: building blocks for iterative methods. SIAM (1994)
Bauer, F.L.: Optimally scaled matrices. Numer. Math. 5(1), 73–87 (1963)
Buoncristiani, N., Shah, S., Donofrio, D., Shalf, J.: Evaluating the numerical stability of posit arithmetic. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 612–621. IEEE (2020)
Carmichael, Z., Langroudi, H.F., Khazanov, C., Lillie, J., Gustafson, J.L., Kudithipudi, D.: Deep positron: a deep neural network using the posit number system. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1421–1426. IEEE (2019)
Carson, E., Higham, N.J.: A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems. SIAM J. Sci. Comput. 39(6), A2834–A2856 (2017)
Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), A817–A847 (2018)
Cline, A.K., Moler, C.B., Stewart, G.W., Wilkinson, J.H.: An estimate for the condition number of a matrix. SIAM J. Numer. Anal. 16(2), 368–375 (1979)
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: Fast deep neural networks for image processing using posits and arm scalable vector extension. J. Real-Time Image Proc. 17(3), 759–771 (2020)
Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
Dawson, A., Düben, P.D., MacLeod, D.A., Palmer, T.N.: Reliable low precision simulations in land surface models. Clim. Dyn. 51(7), 2657–2666 (2018)
Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra-precise iterative refinement. ACM Trans. Math. Softw. (TOMS) 32(2), 325–351 (2006)
Elble, J.M., Sahinidis, N.V.: Scaling linear optimization problems prior to application of the simplex method. Comput. Optim. Appl. 52, 345–371 (2012)
Feldman, M.: Fujitsu reveals details of processor that will power Post-K supercomputer (2018). Accessed 26 Mar 2019
Forsythe, G.E., Moler, C.B.: Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs (1967)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Mathematical Sciences. Johns Hopkins University Press, Baltimore (1996)
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (2013)
Posit Working Group: Standard for posit arithmetic. Technical report, National Supercomputing Centre (NSCC) Singapore (2022)
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746. PMLR (2015)
Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: posit arithmetic. Supercomput. Front. Innov. 4(2), 71–86 (2017)
Haidar, A., et al.: The design of fast and energy-efficient linear solvers: on the potential of half-precision arithmetic and iterative refinement techniques. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10860, pp. 586–600. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_45
Haidar, A., Tomov, S., Dongarra, J., Higham, N.J.: Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 603–613. IEEE (2018)
Haidar, A., Wu, P., Tomov, S., Dongarra, J.: Investigating half precision arithmetic to accelerate dense linear system solvers. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 1–8 (2017)
He, Y., Ding, C.H.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput. 18(3), 259–277 (2001)
Higham, N.J.: Iterative refinement for linear systems and lapack. IMA J. Numer. Anal. 17(4), 495–509 (1997)
Higham, N.J., Mary, T.: A new preconditioner that exploits low-rank approximations to factorization error. SIAM J. Sci. Comput. 41(1), A59–A82 (2019)
Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half-precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019)
Hook, J., Pestana, J., Tisseur, F., Hogg, J.: Max-balanced hungarian scalings. SIAM J. Matrix Anal. Appl. 40(1), 320–346 (2019)
Intel Corporation: BFLOAT16 - Hardware Numerics Definition (2018)
Kharya, P.: Tensorfloat-32 in the A100 GPU accelerates AI training HPC up to 20x. NVIDIA Corporation, Technical report (2020)
Kirdani-Ryan, M., Lim, K., Smith, G., Petrisko, D.: Well rounded: visualizing floating point representations (2019). https://cse512-19s.github.io/FP-Well-Rounded. Accessed 08 Oct 2023
Kulisch, U.: Grundlagen des numerischen rechnens-mathematische begründung der rechnerarithmetik, reihe informatik (1976)
Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 50. IEEE (2006)
Leon, S.J., De Pillis, L.: Linear Algebra with Applications. Pearson, London (2020)
Lindquist, N., Luszczek, P., Dongarra, J.: Improving the performance of the GMRES method using mixed-precision techniques. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 51–66. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_4
Lu, J., Fang, C., Xu, M., Lin, J., Wang, Z.: Evaluations on deep neural networks training using posit number system. IEEE Trans. Comput. 70(2), 174–187 (2020)
Ma, D., Saunders, M.A.: Solving multiscale linear programs using the simplex method in quadruple precision. In: Numerical Analysis and Optimization, pp. 223–235. Springer, Heidelberg (2015)
Ma, D., Yang, L., Fleming, R.M., Thiele, I., Palsson, B.O., Saunders, M.A.: Reliable and efficient solution of genome-scale models of metabolism and macromolecular expression. Sci. Rep. 7(1), 1–11 (2017)
Mallasén, D., Murillo, R., Del Barrio, A.A., Botella, G., Piñuel, L., Prieto-Matias, M.: PERCIVAL: open-source posit RISC-V core with quire capability. IEEE Trans. Emerg. Top. Comput. 10(3), 1241–1252 (2022)
Murillo, R., Del Barrio, A.A., Botella, G.: Deep pensieve: a deep learning framework based on the posit number system. Digit. Signal Process. 102, 102762 (2020)
Omtzigt, E.T.L., Quinlan, J.: Universal numbers library: Multi-format variable precision arithmetic library. J. Open Source Softw. 8(83), 5072 (2023). https://doi.org/10.21105/joss.05072
Palmer, T.N.: More reliable forecasts with less precise computations: a fast-track route to cloud-resolved weather and climate simulators? Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 372(2018), 20130391 (2014)
Suraksha, E.P.: Bengaluru-based calligo tech to receive first silicon with posit computing capability this month. The Econmoic Times (2024)
Svyatkovskiy, A., Kates-Harbeck, J., Tang, W.: Training distributed deep recurrent neural networks with mixed precision on GPU clusters. In: Proceedings of the Machine Learning on HPC Environments, pp. 1–8. ACM (2017)
Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice-Hall Inc., Hoboken (1963)
Wu, C., Wang, M., Chu, X., Wang, K., He, L.: Low-precision floating-point arithmetic for high-performance FPGA-based CNN acceleration. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 15(1), 1–21 (2021)
Acknowledgments
The first author acknowledges partial support for this research by the Maine Economic Improvement Fund (MEIF). However, the authors received no direct financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Quinlan, J., Omtzigt, E.T.L. (2024). Iterative Refinement with Low-Precision Posit Arithmetic. In: Michalewicz, M., Gustafson, J., De Silva, H. (eds) Next Generation Arithmetic. CoNGA 2024. Lecture Notes in Computer Science, vol 14666. Springer, Cham. https://doi.org/10.1007/978-3-031-72709-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-72709-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72708-5
Online ISBN: 978-3-031-72709-2
eBook Packages: Computer ScienceComputer Science (R0)