Skip to main content

Iterative Refinement with Low-Precision Posit Arithmetic

  • Conference paper
  • First Online:
Next Generation Arithmetic (CoNGA 2024)

Abstract

This study examines the mixed-precision iterative refinement technique using posit numbers instead of standard IEEE floating-point. The process is applied to a general linear system \(Ax = b\) where A is a large sparse matrix. Multiple scaling strategies, including row and column equilibration, scale matrix entries into higher-density regions of machine numbers before performing the \(O(n^3)\) factorization operation. Low-precision LU factorization followed by forward/backward substitution yields an initial estimate. The residual \(r = b - Ax\) is computed to a higher precision with a deferred rounding mechanism, then used as the right-hand side in a new linear system \(Ac = r\). The corrector c is calculated and used to refine the previous solution. Results show a 16-bit posit configuration coupled with equilibration yields accuracy comparable to IEEE half-precision (fp16), showing potential for balancing efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    There is always at least one regime bit, and for \(n > 2\), there are at least two bits. For \(n>2\), \(1 \le r \le n-1\), therefore \( -(n-1) \le k \le n-2\).

  2. 2.

    Formerly known as the University of Florida Sparse Matrix Collection.

References

  1. Al-Kurdi, A., Kincaid, D.R.: LU-decomposition with iterative refinement for solving sparse linear systems. J. Comput. Appl. Math. 185(2), 391–403 (2006)

    Article  MathSciNet  Google Scholar 

  2. Anderson, E., et al.: LAPACK users’ guide. SIAM (1999)

    Google Scholar 

  3. Baboulin, M., et al.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180(12), 2526–2533 (2009)

    Article  Google Scholar 

  4. Bailey, D.H., Borwein, J.M.: High-precision arithmetic in mathematical physics. Mathematics 3(2), 337–367 (2015)

    Article  Google Scholar 

  5. Barrett, R., et al.: Templates for the solution of linear systems: building blocks for iterative methods. SIAM (1994)

    Google Scholar 

  6. Bauer, F.L.: Optimally scaled matrices. Numer. Math. 5(1), 73–87 (1963)

    Article  MathSciNet  Google Scholar 

  7. Buoncristiani, N., Shah, S., Donofrio, D., Shalf, J.: Evaluating the numerical stability of posit arithmetic. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 612–621. IEEE (2020)

    Google Scholar 

  8. Carmichael, Z., Langroudi, H.F., Khazanov, C., Lillie, J., Gustafson, J.L., Kudithipudi, D.: Deep positron: a deep neural network using the posit number system. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1421–1426. IEEE (2019)

    Google Scholar 

  9. Carson, E., Higham, N.J.: A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems. SIAM J. Sci. Comput. 39(6), A2834–A2856 (2017)

    Article  MathSciNet  Google Scholar 

  10. Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), A817–A847 (2018)

    Article  MathSciNet  Google Scholar 

  11. Cline, A.K., Moler, C.B., Stewart, G.W., Wilkinson, J.H.: An estimate for the condition number of a matrix. SIAM J. Numer. Anal. 16(2), 368–375 (1979)

    Article  MathSciNet  Google Scholar 

  12. Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: Fast deep neural networks for image processing using posits and arm scalable vector extension. J. Real-Time Image Proc. 17(3), 759–771 (2020)

    Article  Google Scholar 

  13. Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)

    MathSciNet  Google Scholar 

  14. Dawson, A., Düben, P.D., MacLeod, D.A., Palmer, T.N.: Reliable low precision simulations in land surface models. Clim. Dyn. 51(7), 2657–2666 (2018)

    Article  Google Scholar 

  15. Demmel, J., Hida, Y., Kahan, W., Li, X.S., Mukherjee, S., Riedy, E.J.: Error bounds from extra-precise iterative refinement. ACM Trans. Math. Softw. (TOMS) 32(2), 325–351 (2006)

    Article  MathSciNet  Google Scholar 

  16. Elble, J.M., Sahinidis, N.V.: Scaling linear optimization problems prior to application of the simplex method. Comput. Optim. Appl. 52, 345–371 (2012)

    Article  MathSciNet  Google Scholar 

  17. Feldman, M.: Fujitsu reveals details of processor that will power Post-K supercomputer (2018). Accessed 26 Mar 2019

    Google Scholar 

  18. Forsythe, G.E., Moler, C.B.: Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs (1967)

    Google Scholar 

  19. Golub, G.H., Van Loan, C.F.: Matrix Computations. Mathematical Sciences. Johns Hopkins University Press, Baltimore (1996)

    Google Scholar 

  20. Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (2013)

    Google Scholar 

  21. Posit Working Group: Standard for posit arithmetic. Technical report, National Supercomputing Centre (NSCC) Singapore (2022)

    Google Scholar 

  22. Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746. PMLR (2015)

    Google Scholar 

  23. Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: posit arithmetic. Supercomput. Front. Innov. 4(2), 71–86 (2017)

    Google Scholar 

  24. Haidar, A., et al.: The design of fast and energy-efficient linear solvers: on the potential of half-precision arithmetic and iterative refinement techniques. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10860, pp. 586–600. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_45

    Chapter  Google Scholar 

  25. Haidar, A., Tomov, S., Dongarra, J., Higham, N.J.: Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 603–613. IEEE (2018)

    Google Scholar 

  26. Haidar, A., Wu, P., Tomov, S., Dongarra, J.: Investigating half precision arithmetic to accelerate dense linear system solvers. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 1–8 (2017)

    Google Scholar 

  27. He, Y., Ding, C.H.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput. 18(3), 259–277 (2001)

    Article  Google Scholar 

  28. Higham, N.J.: Iterative refinement for linear systems and lapack. IMA J. Numer. Anal. 17(4), 495–509 (1997)

    Article  MathSciNet  Google Scholar 

  29. Higham, N.J., Mary, T.: A new preconditioner that exploits low-rank approximations to factorization error. SIAM J. Sci. Comput. 41(1), A59–A82 (2019)

    Article  MathSciNet  Google Scholar 

  30. Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half-precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019)

    Article  MathSciNet  Google Scholar 

  31. Hook, J., Pestana, J., Tisseur, F., Hogg, J.: Max-balanced hungarian scalings. SIAM J. Matrix Anal. Appl. 40(1), 320–346 (2019)

    Article  MathSciNet  Google Scholar 

  32. Intel Corporation: BFLOAT16 - Hardware Numerics Definition (2018)

    Google Scholar 

  33. Kharya, P.: Tensorfloat-32 in the A100 GPU accelerates AI training HPC up to 20x. NVIDIA Corporation, Technical report (2020)

    Google Scholar 

  34. Kirdani-Ryan, M., Lim, K., Smith, G., Petrisko, D.: Well rounded: visualizing floating point representations (2019). https://cse512-19s.github.io/FP-Well-Rounded. Accessed 08 Oct 2023

  35. Kulisch, U.: Grundlagen des numerischen rechnens-mathematische begründung der rechnerarithmetik, reihe informatik (1976)

    Google Scholar 

  36. Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J.: Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 50. IEEE (2006)

    Google Scholar 

  37. Leon, S.J., De Pillis, L.: Linear Algebra with Applications. Pearson, London (2020)

    Google Scholar 

  38. Lindquist, N., Luszczek, P., Dongarra, J.: Improving the performance of the GMRES method using mixed-precision techniques. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 51–66. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_4

    Chapter  Google Scholar 

  39. Lu, J., Fang, C., Xu, M., Lin, J., Wang, Z.: Evaluations on deep neural networks training using posit number system. IEEE Trans. Comput. 70(2), 174–187 (2020)

    Article  Google Scholar 

  40. Ma, D., Saunders, M.A.: Solving multiscale linear programs using the simplex method in quadruple precision. In: Numerical Analysis and Optimization, pp. 223–235. Springer, Heidelberg (2015)

    Google Scholar 

  41. Ma, D., Yang, L., Fleming, R.M., Thiele, I., Palsson, B.O., Saunders, M.A.: Reliable and efficient solution of genome-scale models of metabolism and macromolecular expression. Sci. Rep. 7(1), 1–11 (2017)

    Google Scholar 

  42. Mallasén, D., Murillo, R., Del Barrio, A.A., Botella, G., Piñuel, L., Prieto-Matias, M.: PERCIVAL: open-source posit RISC-V core with quire capability. IEEE Trans. Emerg. Top. Comput. 10(3), 1241–1252 (2022)

    Article  Google Scholar 

  43. Murillo, R., Del Barrio, A.A., Botella, G.: Deep pensieve: a deep learning framework based on the posit number system. Digit. Signal Process. 102, 102762 (2020)

    Article  Google Scholar 

  44. Omtzigt, E.T.L., Quinlan, J.: Universal numbers library: Multi-format variable precision arithmetic library. J. Open Source Softw. 8(83), 5072 (2023). https://doi.org/10.21105/joss.05072

  45. Palmer, T.N.: More reliable forecasts with less precise computations: a fast-track route to cloud-resolved weather and climate simulators? Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 372(2018), 20130391 (2014)

    Article  MathSciNet  Google Scholar 

  46. Suraksha, E.P.: Bengaluru-based calligo tech to receive first silicon with posit computing capability this month. The Econmoic Times (2024)

    Google Scholar 

  47. Svyatkovskiy, A., Kates-Harbeck, J., Tang, W.: Training distributed deep recurrent neural networks with mixed precision on GPU clusters. In: Proceedings of the Machine Learning on HPC Environments, pp. 1–8. ACM (2017)

    Google Scholar 

  48. Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice-Hall Inc., Hoboken (1963)

    Google Scholar 

  49. Wu, C., Wang, M., Chu, X., Wang, K., He, L.: Low-precision floating-point arithmetic for high-performance FPGA-based CNN acceleration. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 15(1), 1–21 (2021)

    Google Scholar 

Download references

Acknowledgments

The first author acknowledges partial support for this research by the Maine Economic Improvement Fund (MEIF). However, the authors received no direct financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Quinlan .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Quinlan, J., Omtzigt, E.T.L. (2024). Iterative Refinement with Low-Precision Posit Arithmetic. In: Michalewicz, M., Gustafson, J., De Silva, H. (eds) Next Generation Arithmetic. CoNGA 2024. Lecture Notes in Computer Science, vol 14666. Springer, Cham. https://doi.org/10.1007/978-3-031-72709-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72709-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72708-5

  • Online ISBN: 978-3-031-72709-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics