Abstract
This paper proposes a new fault detection mechanism for the computation of eigenvalues and eigenvectors, the so called eigenproblem, for which no such scheme existed before, to the best of our knowledge. It consists of a number of assertions that can be executed on the results of the computation to determine their correctness. The proposed scheme follows the Result Checking principle, since it does not depend on the particular numerical algorithm used. It can handle both real and complex matrices, symmetric or not. Many practical issues are handled, like rounding errors and eigenvalue ordering, and a practical implementation was built on top of unmodified routines of the well-known LAPACK library. The proposed scheme is simultaneously very efficient, with less than 2% performance overhead for medium to large matrices, very effective, since it exhibited a fault coverage greater than 99.7% with a confidence level of 99%, when subjected to extensive fault-injection experiments, and very easy to adapt to other libraries of mathematical routines besides LAPACK.
This work was partially supported by the Portuguese Ministério da Ciência e Tecnologia, the European Union through the R&D Unit 326/94 (CISUC) and the project PRAXIS XXI 2/2.1/TIT/1625/95 (PARQUANTUM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huang, K.-H. and J. A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, in IEEE Transactions on Computers, 1984, p. 518–528.
Banerjee, P., et al., Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor, in IEEE Transactions on Computers, 1990, p. 1132–1144.
Chowdhury, A. R. and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations in 24th International Symposium on Fault-Tolerant Computing, 1994. Austin, Texas, p. 38–47.
Rela, M. Z., H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai-Japan, p. 394–403.
Silva, J. G., J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai, Japan, p. 415–424.
Chen, C.-Y. and A. Abraham. Fault-tolerant Systems for the computation of Eigenvalues and Singular Values in Proc. SPIE, Advanced Algorithms Architectures Signal Processing, 1986, p. 228–237.
Balasubramanian, V. and P. Banerjee, Algorithm-Based Error Detection for Signal Processing Applications on a Hypercube Multiprocessor, in Real-Time Systems Symposium, 1989, p. 134–143.
Blum, M. and S. Kannan, Designing Programs that Check Their Work. Journal of the Association for Computing Machinery, 1995. 42(1): p. 269–291.
Prata, P. and J. G. Silva. Algorithm Based Fault Tolerance Versus Result-Checking for Matrix Computations. To appear in 29th International Symposium on Fault-Tolerant Computing, 1999. Madison, Wisconsin, USA.
Velde, E. F. V. d., Concurrent Scientific Computing. 1994: Springer-Verlag.
Demmel, J. W., Applied Numerical Linear Algebra. 1997: SIAM.
Anderson, E., Z. Bai, C. Bischof, and e. al., LAPACK Users’ Guide. 1995: SIAM.
Blum, M. and H. Wasserman, Reflections on The Pentium Division Bug, in IEEE Transactions on Computers, 1996, p. 385–393.
Rubinfeld, R., A Mathematical Theory of Self-Checking, Self-Testing and Self-Correcting Programs, PhD Thesis. University of California at Berkeley, 1990. 103 pages.
Wasserman, H. and M. Blum, Software Reliability via Run-Time Result-Checking. Journal of the ACM, 1997. 44(6): p. 826–849.
Rubinfeld, R. Robust functional equations with applications to self-testing / correcting in 35th IEEE Conference on Foundations of Computer Science, 1994, p. 288–299.
Silva, J. G., P. Prata, M. Rela, and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model in 28th International Symposium on Fault-Tolerant Computing, 1998. Munich, Germany, p. 26–35.
Golub, G. H. and C. F. V. Loan, Matrix Computations. Second edition ed. 1989: Johns Hopkins University Press.
Watkins, D. S., Fundamentals of Matrix Computations. 1991: John Wiley & Sons.
Reddy, A. L. N. and P. Banerjee, Algorithm-Based Fault Detection for Signal Processing Applications, in IEEE Transactions on Computers, 1990, p. 1304–1308.
Jou, J.-Y. and J. A. Abraham, Fault-Tolerant Matrix Arithmetic and Signal Processing on Highly Concurrent Computing Structures, in Proceedings of the IEEE, 1986, p. 732–741.
Higham, N., Accuracy and Stability of Numerical Algorithms. 1996: SIAM.
Powell, D., M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations in 2nd European Dependable Computing Conference, 1996. Taormina, Italy, p. 37–54.
Carreira, J., H. Madeira, and J. G. Silva, Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers, in IEEE Transactions on Software Engineering, 1998, p. 125–135.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prata, P., Silva, J.G. (1999). Fault-Detection by Result-Checking for the Eigenproblem1 . In: Hlavička, J., Maehle, E., Pataricza, A. (eds) Dependable Computing — EDCC-3. EDCC 1999. Lecture Notes in Computer Science, vol 1667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48254-7_28
Download citation
DOI: https://doi.org/10.1007/3-540-48254-7_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66483-3
Online ISBN: 978-3-540-48254-3
eBook Packages: Springer Book Archive