Fault-Detection by Result-Checking for the Eigenproblem1

Prata, Paula; Silva, João Gabriel

doi:10.1007/3-540-48254-7_28

Paula Prata⁷ &
João Gabriel Silva⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1667))

Included in the following conference series:

European Dependable Computing Conference

Abstract

This paper proposes a new fault detection mechanism for the computation of eigenvalues and eigenvectors, the so called eigenproblem, for which no such scheme existed before, to the best of our knowledge. It consists of a number of assertions that can be executed on the results of the computation to determine their correctness. The proposed scheme follows the Result Checking principle, since it does not depend on the particular numerical algorithm used. It can handle both real and complex matrices, symmetric or not. Many practical issues are handled, like rounding errors and eigenvalue ordering, and a practical implementation was built on top of unmodified routines of the well-known LAPACK library. The proposed scheme is simultaneously very efficient, with less than 2% performance overhead for medium to large matrices, very effective, since it exhibited a fault coverage greater than 99.7% with a confidence level of 99%, when subjected to extensive fault-injection experiments, and very easy to adapt to other libraries of mathematical routines besides LAPACK.

This work was partially supported by the Portuguese Ministério da Ciência e Tecnologia, the European Union through the R&D Unit 326/94 (CISUC) and the project PRAXIS XXI 2/2.1/TIT/1625/95 (PARQUANTUM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huang, K.-H. and J. A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, in IEEE Transactions on Computers, 1984, p. 518–528.
Google Scholar
Banerjee, P., et al., Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor, in IEEE Transactions on Computers, 1990, p. 1132–1144.
Google Scholar
Chowdhury, A. R. and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations in 24th International Symposium on Fault-Tolerant Computing, 1994. Austin, Texas, p. 38–47.
Google Scholar
Rela, M. Z., H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai-Japan, p. 394–403.
Google Scholar
Silva, J. G., J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai, Japan, p. 415–424.
Google Scholar
Chen, C.-Y. and A. Abraham. Fault-tolerant Systems for the computation of Eigenvalues and Singular Values in Proc. SPIE, Advanced Algorithms Architectures Signal Processing, 1986, p. 228–237.
Google Scholar
Balasubramanian, V. and P. Banerjee, Algorithm-Based Error Detection for Signal Processing Applications on a Hypercube Multiprocessor, in Real-Time Systems Symposium, 1989, p. 134–143.
Google Scholar
Blum, M. and S. Kannan, Designing Programs that Check Their Work. Journal of the Association for Computing Machinery, 1995. 42(1): p. 269–291.
MATH Google Scholar
Prata, P. and J. G. Silva. Algorithm Based Fault Tolerance Versus Result-Checking for Matrix Computations. To appear in 29th International Symposium on Fault-Tolerant Computing, 1999. Madison, Wisconsin, USA.
Google Scholar
Velde, E. F. V. d., Concurrent Scientific Computing. 1994: Springer-Verlag.
Google Scholar
Demmel, J. W., Applied Numerical Linear Algebra. 1997: SIAM.
Google Scholar
Anderson, E., Z. Bai, C. Bischof, and e. al., LAPACK Users’ Guide. 1995: SIAM.
Google Scholar
Blum, M. and H. Wasserman, Reflections on The Pentium Division Bug, in IEEE Transactions on Computers, 1996, p. 385–393.
Google Scholar
Rubinfeld, R., A Mathematical Theory of Self-Checking, Self-Testing and Self-Correcting Programs, PhD Thesis. University of California at Berkeley, 1990. 103 pages.
Google Scholar
Wasserman, H. and M. Blum, Software Reliability via Run-Time Result-Checking. Journal of the ACM, 1997. 44(6): p. 826–849.
Article MATH MathSciNet Google Scholar
Rubinfeld, R. Robust functional equations with applications to self-testing / correcting in 35th IEEE Conference on Foundations of Computer Science, 1994, p. 288–299.
Google Scholar
Silva, J. G., P. Prata, M. Rela, and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model in 28th International Symposium on Fault-Tolerant Computing, 1998. Munich, Germany, p. 26–35.
Google Scholar
Golub, G. H. and C. F. V. Loan, Matrix Computations. Second edition ed. 1989: Johns Hopkins University Press.
Google Scholar
Watkins, D. S., Fundamentals of Matrix Computations. 1991: John Wiley & Sons.
Google Scholar
Reddy, A. L. N. and P. Banerjee, Algorithm-Based Fault Detection for Signal Processing Applications, in IEEE Transactions on Computers, 1990, p. 1304–1308.
Google Scholar
Jou, J.-Y. and J. A. Abraham, Fault-Tolerant Matrix Arithmetic and Signal Processing on Highly Concurrent Computing Structures, in Proceedings of the IEEE, 1986, p. 732–741.
Google Scholar
Higham, N., Accuracy and Stability of Numerical Algorithms. 1996: SIAM.
Google Scholar
Powell, D., M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations in 2nd European Dependable Computing Conference, 1996. Taormina, Italy, p. 37–54.
Google Scholar
Carreira, J., H. Madeira, and J. G. Silva, Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers, in IEEE Transactions on Software Engineering, 1998, p. 125–135.
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. Matemática/Informática, Universidade da Beira Interior, Rua Marquês d’Ávila e Bolama, P- 6200, Covilhã, Portugal
Paula Prata
Dep. Eng. Informática/CISUC, Universidade de Coimbra, Pinhal de Marrocos, P-3030, Coimbra, Portugal
João Gabriel Silva

Authors

Paula Prata
View author publications
You can also search for this author in PubMed Google Scholar
João Gabriel Silva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Czech Technical University in Prague, Karlovo nam 13, CZ-12135, Prague 2, Czech Republic
Jan Hlavička
Institut für Technische Informatik, Medizinische Universität zu Lübeck, Ratzeburger Allee 160, 23538, Lübeck, Germany
Erik Maehle
Department of Measurement and Information Systems, Technical University of Budapest, Pázmány P. sétány 1/d, H-1521, Budapest, Hungary
András Pataricza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prata, P., Silva, J.G. (1999). Fault-Detection by Result-Checking for the Eigenproblem¹ . In: Hlavička, J., Maehle, E., Pataricza, A. (eds) Dependable Computing — EDCC-3. EDCC 1999. Lecture Notes in Computer Science, vol 1667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48254-7_28

Download citation

DOI: https://doi.org/10.1007/3-540-48254-7_28
Published: 24 March 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66483-3
Online ISBN: 978-3-540-48254-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics