skip to main content
10.1145/2465813.2465816acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

Published:18 June 2013Publication History

ABSTRACT

In this paper, we compare the radiation response of GPUs executing matrix multiplication and FFT algorithms. The provided experimental results demonstrate that for both algorithms, in the majority of cases, the output is affected by multiple errors. The architectural and code analysis highlight that multiple errors are caused by shared resources corruption or thread dependencies. The experimental data and analytical studies can be fruitfully employed to evaluate the expected error rate of GPUs in realistic applications and to design specific and optimized software-based hardening procedures.

References

  1. J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing" Proceedings of the IEEE, vol.96, no.5, pp.879--899, May 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture" IEEE MICRO, vol. 28, n. 2, March/April 2008, pp. 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Kruger and R. Westermann, "Linear Algebra operators for GPU implementation of numerical algorithms", ACM Trans. Graph. n. 22, vol. 3, 2003, pp. 908--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Liepe, C. Barnes, E. Cule, K. Erguler, P. Kirk, T. Toni, and M. P. H. Stumpf, "ABC-SysBio-approximate Bayesian computation in Python with GPU support" -- Bioinformatics, vol. 26, n. 14, July 2012, pp. 1797--1799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Introducing Titan, www.olcf.ornl.gov/titan.Google ScholarGoogle Scholar
  6. P. Rech, C. Aguiar, R. Ferreira, M. Silvestri, A. Griffoni, C. Frost, and L. Carro, "Neutron-Induced Soft Error in Graphic Processing Units", in proc. IEEE REDW 2012, Miami, FL, USA.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Rech, C. Aguiar, C. Frost, and L. Carro, "Neutron Radiation Test of Graphic Processing Units", in proc. IEEE IOLTS 2012, Sitges, Spain. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Seifert, Zhu Xiaowei, and L. W. Massengill, "Impact of Scaling on Soft-Error Rates in Commercial Microprocessors", IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 3100, 2002, 3106.Google ScholarGoogle ScholarCross RefCross Ref
  9. H.T. Nguyen, Y. Yagil, N. Seifert, and M. Reitsma, "Chip-level Soft Error Estimation Method", IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, 2005, pp. 356, 381.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Software Hardening Techniques for GPUs", in proc. IEEE RADECS 2012, Bordeaux, France.Google ScholarGoogle Scholar
  11. D. B. Kirk, W.W. Hwo, "Programming Massively Parallel Processors", MK Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NVIDIA GeForce GTX 480/470/465 GPU DatasheetGoogle ScholarGoogle Scholar
  13. NVIDIA Tesla C2050/C2075 GPU DatasheetGoogle ScholarGoogle Scholar
  14. M. Violante, et al., "A New Hardware/Software Platform and a New 1/E Neutron Source for Soft Error Studies: Testing FPGAs at the ISIS Facility", IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 1184--1189.Google ScholarGoogle ScholarCross RefCross Ref
  15. R.C. Baumann, "Neutron-induced boron fission as a major source of soft errors in deep submicron SRAM devices", in proc. IEEE IRPS 2000, pp. 152--157.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs", in proc. IEEE ETS 2013, Avignon, FranceGoogle ScholarGoogle ScholarCross RefCross Ref
  17. E. Normand, "Single Event Effects in Avionics", IEEE Trans. Nucl. Sci., Vol. 43, No. 2, Apr. 1996, pp. 461--474.Google ScholarGoogle ScholarCross RefCross Ref
  18. NVIDIA BENCH: Tesla C2050 Performance BenchmarksGoogle ScholarGoogle Scholar
  19. K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations", IEEE Trans. on Computers, vol. c-33, no. 6, June 1984, pp. 518--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Freivalds, Fast Probabilistic Algorithms, In Matematical Formulations of CS, Lecture notes in Computer Science, vol. 74, 1979, pp. 57--69.Google ScholarGoogle Scholar
  21. D. Bailey, et al., "The NAS Parallel Benchmarks", RNR Technical Report RNR-94-007, March 1994.Google ScholarGoogle Scholar
  22. T. G. Stockham, "High-Speed Convolution and Correlation", in proc. Spring Joint Computer Conference, 1966, pp. 229--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Caminiti, I. Finocchi, E. G. Fusco, and F. Silvestri, "Dynamic programming in faulty memory hierarchies (cache-obliviously)", in proc. of 31st FSTTCS, LIPIcs 13, pp. 433--444.Google ScholarGoogle Scholar
  24. R. M. Karp and M. O. Rabin, "Efficient randomized pattern-matching algorithms", IBM J. Res. Dev., 1987, vol. 31, no. 2, pp. 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
        June 2013
        64 pages
        ISBN:9781450319836
        DOI:10.1145/2465813

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        FTXS '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate16of25submissions,64%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader