Abstract
We propose a new way to detect and correct silent errors in the conjugate gradient algorithm. The detection criterion is simple, cheap to implement, and can be used at each iteration. This simplifies the correction process. Numerical experiments show that the new criterion is robust and reliable.











Similar content being viewed by others
Notes
They can be obtained at https://sparse.tamu.edu.
References
Huang, K., Abraham, J.A.: Algorithm-based fault tolerance for matrix operations. IEEE Transactions on Computers 100(6), 518–528 (1984)
Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08, pp. 155–164. ACM, New York, USA (2008)
Bridges, P.G., Ferreira, K.B., Heroux, M.A., Hoemmen, M.: Fault-tolerant linear solvers via selective reliability. arXiv:1206.1390 (2012)
Shantharam, M., Srinivasmurthy, S., Raghavan, P.: Fault tolerant preconditioned conjugate gradient for sparse linear system solution. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS’12, New York, USA, pp. 69–78 (2012)
Chen, Z.: Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods. In: 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP’13, pp. 167–176. ACM, New York, USA (2013)
Elliott, J., Mueller, F., Stoyanov, F., Webster, C.: Quantifying the impact of single bit flips on floating point arithmetic. Report North Carolina State University, Dept. of Computer Science (2013)
Sao, P., Vuduc, R.: Self-stabilizing iterative solvers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (2013)
Rubenstein, Z., Fujita, H., Zheng, Z., Chien, A.: Error checking and snapshot-based recovery in a preconditioned conjugate gradient solver. Technical Report TR-2013-11, Department of Computer Science, University of Chicago (2013)
Elliott, J., Hoemmen, M.: Quantifying the impact of single bit flips in GMRES. In: CSRI Summer Proceedings 2013, pp. 10–31. CSRI (2014)
Elliott, J., Hoemmen, M., Mueller, F.: Evaluating the impact of SDC on the GMRES iterative solver. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, Piscataway (2014)
Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies. arXiv:1401.3013 (2014)
Elliott, J., Hoemmen, M., Mueller, F.: A numerical soft fault model for iterative linear solvers. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (2015)
Elliott, J., Hoemmen, M., Mueller, F.: Exploiting data representation for fault tolerance. Technical Report SAND-2016-0354J, Sandia National Laboratory (2016)
Fasi, M., Langou, J., Robert, Y., Uçar, B.: A backward/forward recovery approach for the preconditioned conjugate gradient method. J. Comput. Sci. 17, 522–534 (2016)
Kestor, G., Mutlu, B.O., Manzano, J., Subasi, O., Unsal, O., Krishnamoorthy, S.: Comparative analysis of soft-error detection strategies: A case study with iterative methods. In: Proceedings of the 15th ACM International Conference on Computing Frontiers, pp. 173–182. ACM, New York, USA (2018)
Mutlu, B.O., Kestor, G., Manzano, J., Unsal, O., Chatterjee, S., Krishnamoorthy, S.: Characterization of the impact of soft errors on iterative methods. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 203–214. IEEE, Piscataway (2018)
Agullo, E., Cools, S., Fatih-Yetkin, E., Giraud, L., Schenkels, N., Vanroose, W.: On soft errors in the Conjugate Gradient method: Sensitivity and robust numerical detection. SIAM J. Sci. Comput. 42(6), 335–358 (2020)
Schöll, A., Braun, C., Kochte, M.A., Wunderlich, H.J.: ow-overhead fault-tolerance for the preconditioned conjugate gradient solver. In: Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS’15)(2015)
Saad, Y.: Practical use of polynomial preconditionings for the conjugate gradient method. SIAM J. Sci. Stat. Comput. 6(4), 865–881 (1985)
Meurant, G.: Multitasking the conjugate gradient method on the CRAY X-MP/48. Parallel Computing 5, 267–280 (1987)
Chen, T., Carson, E.: Predict-and-recompute conjugate gradient variants. SIAM J. Sci. Comput. 42(5), 3084–3108 (2020)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA (2002)
Meurant, G.: The Lanczos and Conjugate Gradient Algorithms, from Theory to Finite Precision Computations. SIAM, Philadelphia, USA (2006)
Meurant, G., Tichý, P.: Approximating the extreme Ritz values and upper bounds for the A-norm of the error in CG. Numer. Algorithms 82(3), 937–968 (2019)
Acknowledgements
The author thanks Erin Carson for interesting comments and suggestions.
Author information
Authors and Affiliations
Ethics declarations
Declarations
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Conflict of interest
The author declares no competing interests.
Additional information
This article is dedicated to Claude Brezinski on the occasion of his 80th birthday.
Appendix A: CG local orthogonality
Appendix A: CG local orthogonality
Using (2) it can be shown that
where \(\kappa (A)\) is the condition umber of A and \(C_{k-1}^A\) is a constant involved in the bound \(\vert (Ap_{k-2}, p_{k-1})\vert \le \lambda _n C_{k-1}^A u\) where \(\lambda _n\) is the largest eigenvalue of A.
The three terms in the right-hand side of inequality (A1) are small provided that the ratios are bounded and \(\kappa (A)\) is not too large. Local orthogonality is, in general, well satisfied.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meurant, G. Detection and correction of silent errors in the conjugate gradient algorithm. Numer Algor 92, 869–891 (2023). https://doi.org/10.1007/s11075-022-01380-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-022-01380-1