research-article

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

Authors:
Paolo Rech

UFRGS, Porto Alegre, Brazil

UFRGS, Porto Alegre, Brazil
View Profile

,
Laercio Pilla

UFRGS, Porto Alegre, Brazil

UFRGS, Porto Alegre, Brazil
View Profile

,
Francesco Silvestri

Università di Padova, Padova, Italy

Università di Padova, Padova, Italy
View Profile

,
Philippe Navaux

UFRGS, Porto Alegre, Brazil

UFRGS, Porto Alegre, Brazil
View Profile

,
Luigi Carro

UFRGS, Porto Alegre, Brazil

UFRGS, Porto Alegre, Brazil
View Profile

FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scaleJune 2013Pages 13–20https://doi.org/10.1145/2465813.2465816

Published:18 June 2013Publication History

FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale

Pages 13–20

ABSTRACT

In this paper, we compare the radiation response of GPUs executing matrix multiplication and FFT algorithms. The provided experimental results demonstrate that for both algorithms, in the majority of cases, the output is affected by multiple errors. The architectural and code analysis highlight that multiple errors are caused by shared resources corruption or thread dependencies. The experimental data and analytical studies can be fruitfully employed to evaluate the expected error rate of GPUs in realistic applications and to design specific and optimized software-based hardening procedures.

References

J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing" Proceedings of the IEEE, vol.96, no.5, pp.879--899, May 2008.Google ScholarCross Ref
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture" IEEE MICRO, vol. 28, n. 2, March/April 2008, pp. 39--55. Google ScholarDigital Library
J. Kruger and R. Westermann, "Linear Algebra operators for GPU implementation of numerical algorithms", ACM Trans. Graph. n. 22, vol. 3, 2003, pp. 908--916. Google ScholarDigital Library
J. Liepe, C. Barnes, E. Cule, K. Erguler, P. Kirk, T. Toni, and M. P. H. Stumpf, "ABC-SysBio-approximate Bayesian computation in Python with GPU support" -- Bioinformatics, vol. 26, n. 14, July 2012, pp. 1797--1799. Google ScholarDigital Library
Introducing Titan, www.olcf.ornl.gov/titan.Google Scholar
P. Rech, C. Aguiar, R. Ferreira, M. Silvestri, A. Griffoni, C. Frost, and L. Carro, "Neutron-Induced Soft Error in Graphic Processing Units", in proc. IEEE REDW 2012, Miami, FL, USA.Google ScholarCross Ref
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Neutron Radiation Test of Graphic Processing Units", in proc. IEEE IOLTS 2012, Sitges, Spain. Google ScholarDigital Library
N. Seifert, Zhu Xiaowei, and L. W. Massengill, "Impact of Scaling on Soft-Error Rates in Commercial Microprocessors", IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 3100, 2002, 3106.Google ScholarCross Ref
H.T. Nguyen, Y. Yagil, N. Seifert, and M. Reitsma, "Chip-level Soft Error Estimation Method", IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, 2005, pp. 356, 381.Google ScholarCross Ref
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Software Hardening Techniques for GPUs", in proc. IEEE RADECS 2012, Bordeaux, France.Google Scholar
D. B. Kirk, W.W. Hwo, "Programming Massively Parallel Processors", MK Publishers. Google ScholarDigital Library
NVIDIA GeForce GTX 480/470/465 GPU DatasheetGoogle Scholar
NVIDIA Tesla C2050/C2075 GPU DatasheetGoogle Scholar
M. Violante, et al., "A New Hardware/Software Platform and a New 1/E Neutron Source for Soft Error Studies: Testing FPGAs at the ISIS Facility", IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 1184--1189.Google ScholarCross Ref
R.C. Baumann, "Neutron-induced boron fission as a major source of soft errors in deep submicron SRAM devices", in proc. IEEE IRPS 2000, pp. 152--157.Google ScholarCross Ref
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs", in proc. IEEE ETS 2013, Avignon, FranceGoogle ScholarCross Ref
E. Normand, "Single Event Effects in Avionics", IEEE Trans. Nucl. Sci., Vol. 43, No. 2, Apr. 1996, pp. 461--474.Google ScholarCross Ref
NVIDIA BENCH: Tesla C2050 Performance BenchmarksGoogle Scholar
K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations", IEEE Trans. on Computers, vol. c-33, no. 6, June 1984, pp. 518--528. Google ScholarDigital Library
R. Freivalds, Fast Probabilistic Algorithms, In Matematical Formulations of CS, Lecture notes in Computer Science, vol. 74, 1979, pp. 57--69.Google Scholar
D. Bailey, et al., "The NAS Parallel Benchmarks", RNR Technical Report RNR-94-007, March 1994.Google Scholar
T. G. Stockham, "High-Speed Convolution and Correlation", in proc. Spring Joint Computer Conference, 1966, pp. 229--233. Google ScholarDigital Library
S. Caminiti, I. Finocchi, E. G. Fusco, and F. Silvestri, "Dynamic programming in faulty memory hierarchies (cache-obliviously)", in proc. of 31st FSTTCS, LIPIcs 13, pp. 433--444.Google Scholar
R. M. Karp and M. O. Rabin, "Efficient randomized pattern-matching algorithms", IBM J. Res. Dev., 1987, vol. 31, no. 2, pp. 249--260. Google ScholarDigital Library

Index Terms

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units
1. Hardware
  1. Hardware test
  2. Robustness

Recommendations

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems

Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed ...
Read More
Optimized Software-Based Hardening Strategies for Matrix Multiplication and Fast Fourier Transform
ICACS '18: Proceedings of the 2nd International Conference on Algorithms, Computing and Systems

Nowadays, Graphics Processing Unit (GPU) has shown great potential in High-Performance Computing applications for its parallel computing structures, which can greatly accelerate the computing process. However, GPU reliability is critical in some ...
Read More
Improving Performance of Matrix Multiplication and FFT on GPU
ICPADS '09: Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems

In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
June 2013
64 pages
ISBN:9781450319836
DOI:10.1145/2465813
Program Chairs:
Nathan DeBardeleben
Los Alamos National Laboratory, USA
,
Jon Stearley
Sandia National Laboratory, USA
,
Franck Cappello
INRIA and University of Illinois at Urbana Champaign, France and USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
parallel architectures sensitivity
radiation effects
software-based hardening
Qualifiers
- research-article
Conference

Acceptance Rates
FTXS '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate16of25submissions,64%
More
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 93
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

Optimized Software-Based Hardening Strategies for Matrix Multiplication and Fast Fourier Transform

Improving Performance of Matrix Multiplication and FFT on GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

Optimized Software-Based Hardening Strategies for Matrix Multiplication and Fast Fourier Transform

Improving Performance of Matrix Multiplication and FFT on GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media