Skip to main content
Log in

Detection and correction of silent errors in the conjugate gradient algorithm

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We propose a new way to detect and correct silent errors in the conjugate gradient algorithm. The detection criterion is simple, cheap to implement, and can be used at each iteration. This simplifies the correction process. Numerical experiments show that the new criterion is robust and reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. They can be obtained at https://sparse.tamu.edu.

References

  1. Huang, K., Abraham, J.A.: Algorithm-based fault tolerance for matrix operations. IEEE Transactions on Computers 100(6), 518–528 (1984)

    Article  MATH  Google Scholar 

  2. Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08, pp. 155–164. ACM, New York, USA (2008)

  3. Bridges, P.G., Ferreira, K.B., Heroux, M.A., Hoemmen, M.: Fault-tolerant linear solvers via selective reliability. arXiv:1206.1390 (2012)

  4. Shantharam, M., Srinivasmurthy, S., Raghavan, P.: Fault tolerant preconditioned conjugate gradient for sparse linear system solution. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS’12, New York, USA, pp. 69–78 (2012)

  5. Chen, Z.: Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods. In: 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP’13, pp. 167–176. ACM, New York, USA (2013)

  6. Elliott, J., Mueller, F., Stoyanov, F., Webster, C.: Quantifying the impact of single bit flips on floating point arithmetic. Report North Carolina State University, Dept. of Computer Science (2013)

  7. Sao, P., Vuduc, R.: Self-stabilizing iterative solvers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (2013)

  8. Rubenstein, Z., Fujita, H., Zheng, Z., Chien, A.: Error checking and snapshot-based recovery in a preconditioned conjugate gradient solver. Technical Report TR-2013-11, Department of Computer Science, University of Chicago (2013)

  9. Elliott, J., Hoemmen, M.: Quantifying the impact of single bit flips in GMRES. In: CSRI Summer Proceedings 2013, pp. 10–31. CSRI (2014)

  10. Elliott, J., Hoemmen, M., Mueller, F.: Evaluating the impact of SDC on the GMRES iterative solver. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, Piscataway (2014)

  11. Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies. arXiv:1401.3013 (2014)

  12. Elliott, J., Hoemmen, M., Mueller, F.: A numerical soft fault model for iterative linear solvers. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (2015)

  13. Elliott, J., Hoemmen, M., Mueller, F.: Exploiting data representation for fault tolerance. Technical Report SAND-2016-0354J, Sandia National Laboratory (2016)

  14. Fasi, M., Langou, J., Robert, Y., Uçar, B.: A backward/forward recovery approach for the preconditioned conjugate gradient method. J. Comput. Sci. 17, 522–534 (2016)

    Article  MathSciNet  Google Scholar 

  15. Kestor, G., Mutlu, B.O., Manzano, J., Subasi, O., Unsal, O., Krishnamoorthy, S.: Comparative analysis of soft-error detection strategies: A case study with iterative methods. In: Proceedings of the 15th ACM International Conference on Computing Frontiers, pp. 173–182. ACM, New York, USA (2018)

  16. Mutlu, B.O., Kestor, G., Manzano, J., Unsal, O., Chatterjee, S., Krishnamoorthy, S.: Characterization of the impact of soft errors on iterative methods. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 203–214. IEEE, Piscataway (2018)

  17. Agullo, E., Cools, S., Fatih-Yetkin, E., Giraud, L., Schenkels, N., Vanroose, W.: On soft errors in the Conjugate Gradient method: Sensitivity and robust numerical detection. SIAM J. Sci. Comput. 42(6), 335–358 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  18. Schöll, A., Braun, C., Kochte, M.A., Wunderlich, H.J.: ow-overhead fault-tolerance for the preconditioned conjugate gradient solver. In: Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS’15)(2015)

  19. Saad, Y.: Practical use of polynomial preconditionings for the conjugate gradient method. SIAM J. Sci. Stat. Comput. 6(4), 865–881 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  20. Meurant, G.: Multitasking the conjugate gradient method on the CRAY X-MP/48. Parallel Computing 5, 267–280 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen, T., Carson, E.: Predict-and-recompute conjugate gradient variants. SIAM J. Sci. Comput. 42(5), 3084–3108 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  22. Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA (2002)

    Book  MATH  Google Scholar 

  23. Meurant, G.: The Lanczos and Conjugate Gradient Algorithms, from Theory to Finite Precision Computations. SIAM, Philadelphia, USA (2006)

    Book  MATH  Google Scholar 

  24. Meurant, G., Tichý, P.: Approximating the extreme Ritz values and upper bounds for the A-norm of the error in CG. Numer. Algorithms 82(3), 937–968 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author thanks Erin Carson for interesting comments and suggestions.

Author information

Authors and Affiliations

Authors

Ethics declarations

Declarations

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Conflict of interest

The author declares no competing interests.

Additional information

This article is dedicated to Claude Brezinski on the occasion of his 80th birthday.

Appendix A: CG local orthogonality

Appendix A: CG local orthogonality

Using (2) it can be shown that

$$\begin{aligned} \vert (r_{k}, r_{k-1})\vert\le & {} \kappa (A)\, \frac{\Vert r_{k-1}\Vert }{\Vert p_{k-1}\Vert } \left[ C_{k-1}^A u \frac{\Vert r_{k-1}\Vert }{\Vert p_{k-1}\Vert }\,\frac{\Vert r_{k-1}\Vert ^2}{\Vert r_{k-2}\Vert ^2} + \Vert r_{k-1}\Vert \,\Vert \delta _{k-2}^p\Vert \right] \nonumber \\&+ \Vert \delta _{k-1}^r\Vert \,\Vert r_{k-1}\Vert , \end{aligned}$$
(A1)

where \(\kappa (A)\) is the condition umber of A and \(C_{k-1}^A\) is a constant involved in the bound \(\vert (Ap_{k-2}, p_{k-1})\vert \le \lambda _n C_{k-1}^A u\) where \(\lambda _n\) is the largest eigenvalue of A.

The three terms in the right-hand side of inequality (A1) are small provided that the ratios are bounded and \(\kappa (A)\) is not too large. Local orthogonality is, in general, well satisfied.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meurant, G. Detection and correction of silent errors in the conjugate gradient algorithm. Numer Algor 92, 869–891 (2023). https://doi.org/10.1007/s11075-022-01380-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11075-022-01380-1

Keywords

Mathematics Subject Classification (2010)