Deep Soft Error Propagation Modeling Using Graph Attention Network

Ma, Junchi; Duan, Zongtao; Tang, Lei

doi:10.1007/s10836-022-06005-y

Deep Soft Error Propagation Modeling Using Graph Attention Network

Published: 08 June 2022

Volume 38, pages 303–319, (2022)
Cite this article

Journal of Electronic Testing Aims and scope Submit manuscript

438 Accesses
Explore all metrics

Abstract

Soft errors are increasing in computer systems due to shrinking feature sizes. Soft errors can induce incorrect outputs, also called silent data corruption (SDC), which raises no warnings in the system and hence is difficult to detect. To prevent SDC effectively, protection techniques require a fine-grained profiling of SDC-prone instructions, which is often obtained by applying machine learning models. However, these models rely on handcrafted features, and lack the ability to reason about SDC propagation, which leads to an inferior SDC prediction performance. We propose a novel Graph Attention neTwork to Predict SDC-prone instructions (GATPS). The GATPS representation is a heterogeneous graph with different types of edges to represent various instruction relations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, GATPS automatically captures the structural features that contribute to SDC propagation. The attention mechanism is applied to compute the importance values to the neighboring nodes, which quantifies the fault effect on the neighboring nodes. Moreover, the inductive model of GATPS can be applied to unseen programs without retraining, and it requires no fault injection information of the target program. Experiments revealed GATPS achieved a 34% higher F1 score compared to the baseline method and a 40-fold speedup compared to the fault injection approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction SDC Vulnerability Prediction Using Long Short-Term Memory Neural Network

SDC Error Detection by Exploring the Importance of Instruction Features

Detection Method of Hardware Trojan Based on Attention Mechanism and Residual-Dense-Block under the Markov Transition Field

Article 07 December 2023

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Abadi M, Barham P, Chen J et al (2016) Tensorflow: A system for large-scale machine learning. In: Proc. USENIX symposium on operating systems design and implementation (OSDI). IEEE, pp 265–283
Benacchio T, Bonaventura L, Altenbernd M et al (2021) Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction. Int J High Perform Comput Appl 35(4):285–311
Dixit HD, Pendharkar S, Beadon M et al (2021) Silent data corruptions at scale. arXiv preprint. http://arxiv.org/abs/2102.11245
Fang B, Lu Q, Pattabiraman K et al (2016) ePVF: An enhanced program vulnerability factor methodology for cross-layer resilience analysis. In: Dependable Systems and Networks (DSN). IEEE, pp 168–179
Gao Y, Gupta SK, Wang Y et al (2014) An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems. In: Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp 1–6
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 249–256
Guo L, Li D, Laguna I (2021) Paris: Predicting application resilience using machine learning. J Parallel Distrib Comput
Hashimoto M, Wang L (2020) Soft error and its countermeasures in terrestrial environment. In: Proc. Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, pp 617–622
Hari SKS, Adve SV, Naeimi H (2012) Low-cost program-level detectors for reducing silent data corruptions. In: Dependable Systems and Networks (DSN). IEEE, pp 1–12
Hari SKS, Adve SV, Naeimi H et al (2012) Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults. In: Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, pp 123–134
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems (NIPS). IEEE, pp 1024–1034
Hong H, Guo H, Lin Y et al (2020) An attention-based graph neural network for heterogeneous structural learning. In: Proc. Conference on Artificial Intelligence (AAAI). AI Access Foundation, pp 4132–4139
Kalra C, Previlon F, Rubin N et al (2020) Armorall: Compiler-based resilience targeting gpu applications. ACM Trans Archit Code Optim 17(2):1–24
Article Google Scholar
Laguna I, Schulz M, Richards DF et al (2016) Ipas: Intelligent protection against silent output corruption in scientific applications. In: Proc. International Symposium on Code Generation and Optimization (CGO). IEEE, pp 227–238
Li G, Pattabiraman K (2018) Modeling input-dependent error propagation in programs. In: Dependable Systems and Networks (DSN). IEEE, pp 279–290
Li G, Pattabiraman K, Hari SKS et al (2018) Modeling soft-error propagation in programs. In: Dependable Systems and Networks (DSN). IEEE, pp 27–38
Li Z, Menon H, Maljovec D et al (2020) SpotSDC: Revealing the silent data corruption propagation in high-performance computing systems. IEEE Trans Vis Comput Graph
Li Z, Menon H, Mohror K et al (2021) Understanding a program's resiliency through error propagation. In: Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, pp 362–373
Liu C, Gu J, Yan Z et al (2019) SDC-causing error detection based on lightweight vulnerability prediction. In: Proc. Asian Conference on Machine Learning (ACML). IEEE, pp 1049–1064
Lu Q, Pattabiraman K, Gupta MS et al (2014) SDCTune: A model for predicting the SDC proneness of an application for configurable protection. In: Compilers, Architecture and Synthesis for Embedded Systems (CASES). ACM, pp 1–10
Luk CK, Cohn R, Muth R et al (2005) Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Notices 40(6):190–200
Article Google Scholar
Ma J, Wang Y (2017) Characterization of stack behavior under soft errors. In: Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp 1538–1543
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Schlichtkrull M, Kipf TN, Bloem P et al (2018) Modeling relational data with graph convolutional networks. In: Proc. European Semantic Web Conference. Springer, pp 593–607
Velickovic P, Cucurull G, Casanova A et al (2018) Graph attention networks. In: Proc. International Conference on Learning Representations (ICLR). IEEE, pp 1–12
Xin X, Li ML (2012) Understanding soft error propagation using efficient vulnerability-driven fault injection. In: Dependable Systems and Networks (DSN). IEEE, pp 1–12
Yang N, Wang Y (2019) Predicting the silent data corruption vulnerability of instructions in programs. In Proc. International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 862–869

Download references

Funding

This work was funded by the Natural Science Foundation of China (No.62002030), Key research and development plan project of the Shaanxi Province, China (No.2019ZDLGY17-08, 2019ZDLGY03-09–01, 2019GY-006, 2020GY-013).

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Middle Section of Nan’erhuan Road, Xi’an, Shaanxi, China
Junchi Ma, Zongtao Duan & Lei Tang

Authors

Junchi Ma
View author publications
You can also search for this author inPubMed Google Scholar
Zongtao Duan
View author publications
You can also search for this author inPubMed Google Scholar
Lei Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Junchi Ma.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Responsible Editor: A. Yan

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Duan, Z. & Tang, L. Deep Soft Error Propagation Modeling Using Graph Attention Network. J Electron Test 38, 303–319 (2022). https://doi.org/10.1007/s10836-022-06005-y

Download citation

Received: 02 March 2022
Accepted: 27 May 2022
Published: 08 June 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10836-022-06005-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Soft Error Propagation Modeling Using Graph Attention Network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Instruction SDC Vulnerability Prediction Using Long Short-Term Memory Neural Network

SDC Error Detection by Exploring the Importance of Instruction Features

Detection Method of Hardware Trojan Based on Attention Mechanism and Residual-Dense-Block under the Markov Transition Field

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now