Skip to main content

Fault-Detection by Result-Checking for the Eigenproblem1

  • Conference paper
  • First Online:
Dependable Computing — EDCC-3 (EDCC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1667))

Included in the following conference series:

Abstract

This paper proposes a new fault detection mechanism for the computation of eigenvalues and eigenvectors, the so called eigenproblem, for which no such scheme existed before, to the best of our knowledge. It consists of a number of assertions that can be executed on the results of the computation to determine their correctness. The proposed scheme follows the Result Checking principle, since it does not depend on the particular numerical algorithm used. It can handle both real and complex matrices, symmetric or not. Many practical issues are handled, like rounding errors and eigenvalue ordering, and a practical implementation was built on top of unmodified routines of the well-known LAPACK library. The proposed scheme is simultaneously very efficient, with less than 2% performance overhead for medium to large matrices, very effective, since it exhibited a fault coverage greater than 99.7% with a confidence level of 99%, when subjected to extensive fault-injection experiments, and very easy to adapt to other libraries of mathematical routines besides LAPACK.

This work was partially supported by the Portuguese Ministério da Ciência e Tecnologia, the European Union through the R&D Unit 326/94 (CISUC) and the project PRAXIS XXI 2/2.1/TIT/1625/95 (PARQUANTUM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, K.-H. and J. A. Abraham, Algorithm-Based Fault Tolerance for Matrix Operations, in IEEE Transactions on Computers, 1984, p. 518–528.

    Google Scholar 

  2. Banerjee, P., et al., Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor, in IEEE Transactions on Computers, 1990, p. 1132–1144.

    Google Scholar 

  3. Chowdhury, A. R. and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations in 24th International Symposium on Fault-Tolerant Computing, 1994. Austin, Texas, p. 38–47.

    Google Scholar 

  4. Rela, M. Z., H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai-Japan, p. 394–403.

    Google Scholar 

  5. Silva, J. G., J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems in 26th International Symposium on Fault-Tolerant Computing, 1996. Sendai, Japan, p. 415–424.

    Google Scholar 

  6. Chen, C.-Y. and A. Abraham. Fault-tolerant Systems for the computation of Eigenvalues and Singular Values in Proc. SPIE, Advanced Algorithms Architectures Signal Processing, 1986, p. 228–237.

    Google Scholar 

  7. Balasubramanian, V. and P. Banerjee, Algorithm-Based Error Detection for Signal Processing Applications on a Hypercube Multiprocessor, in Real-Time Systems Symposium, 1989, p. 134–143.

    Google Scholar 

  8. Blum, M. and S. Kannan, Designing Programs that Check Their Work. Journal of the Association for Computing Machinery, 1995. 42(1): p. 269–291.

    MATH  Google Scholar 

  9. Prata, P. and J. G. Silva. Algorithm Based Fault Tolerance Versus Result-Checking for Matrix Computations. To appear in 29th International Symposium on Fault-Tolerant Computing, 1999. Madison, Wisconsin, USA.

    Google Scholar 

  10. Velde, E. F. V. d., Concurrent Scientific Computing. 1994: Springer-Verlag.

    Google Scholar 

  11. Demmel, J. W., Applied Numerical Linear Algebra. 1997: SIAM.

    Google Scholar 

  12. Anderson, E., Z. Bai, C. Bischof, and e. al., LAPACK Users’ Guide. 1995: SIAM.

    Google Scholar 

  13. Blum, M. and H. Wasserman, Reflections on The Pentium Division Bug, in IEEE Transactions on Computers, 1996, p. 385–393.

    Google Scholar 

  14. Rubinfeld, R., A Mathematical Theory of Self-Checking, Self-Testing and Self-Correcting Programs, PhD Thesis. University of California at Berkeley, 1990. 103 pages.

    Google Scholar 

  15. Wasserman, H. and M. Blum, Software Reliability via Run-Time Result-Checking. Journal of the ACM, 1997. 44(6): p. 826–849.

    Article  MATH  MathSciNet  Google Scholar 

  16. Rubinfeld, R. Robust functional equations with applications to self-testing / correcting in 35th IEEE Conference on Foundations of Computer Science, 1994, p. 288–299.

    Google Scholar 

  17. Silva, J. G., P. Prata, M. Rela, and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model in 28th International Symposium on Fault-Tolerant Computing, 1998. Munich, Germany, p. 26–35.

    Google Scholar 

  18. Golub, G. H. and C. F. V. Loan, Matrix Computations. Second edition ed. 1989: Johns Hopkins University Press.

    Google Scholar 

  19. Watkins, D. S., Fundamentals of Matrix Computations. 1991: John Wiley & Sons.

    Google Scholar 

  20. Reddy, A. L. N. and P. Banerjee, Algorithm-Based Fault Detection for Signal Processing Applications, in IEEE Transactions on Computers, 1990, p. 1304–1308.

    Google Scholar 

  21. Jou, J.-Y. and J. A. Abraham, Fault-Tolerant Matrix Arithmetic and Signal Processing on Highly Concurrent Computing Structures, in Proceedings of the IEEE, 1986, p. 732–741.

    Google Scholar 

  22. Higham, N., Accuracy and Stability of Numerical Algorithms. 1996: SIAM.

    Google Scholar 

  23. Powell, D., M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations in 2nd European Dependable Computing Conference, 1996. Taormina, Italy, p. 37–54.

    Google Scholar 

  24. Carreira, J., H. Madeira, and J. G. Silva, Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers, in IEEE Transactions on Software Engineering, 1998, p. 125–135.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prata, P., Silva, J.G. (1999). Fault-Detection by Result-Checking for the Eigenproblem1 . In: Hlavička, J., Maehle, E., Pataricza, A. (eds) Dependable Computing — EDCC-3. EDCC 1999. Lecture Notes in Computer Science, vol 1667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48254-7_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-48254-7_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66483-3

  • Online ISBN: 978-3-540-48254-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics