research-article

SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network

Authors:
Junchi Ma

Chang'an University, Xi'an, Shaanxi Province, China

Chang'an University, Xi'an, Shaanxi Province, China
View Profile

,
Sulei Huang

Chang'an University, Xi'an, Shaanxi Province, China

Chang'an University, Xi'an, Shaanxi Province, China
View Profile

,
Zongtao Duan

Chang'an University, Xi'an, Shaanxi Province, China

Chang'an University, Xi'an, Shaanxi Province, China
View Profile

,
Lei Tang

Chang'an University, Xi'an, Shaanxi Province, China

Chang'an University, Xi'an, Shaanxi Province, China
View Profile

,
Luyang Wang

Chang'an University, Xi'an, Shaanxi Province, China

Chang'an University, Xi'an, Shaanxi Province, China
View Profile

ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation ConferenceJanuary 2023Pages 296–301https://doi.org/10.1145/3566097.3567910

Published:31 January 2023Publication History

ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

Pages 296–301

ABSTRACT

The trend of progressive technology scaling makes the computing system more susceptible to soft errors. The most critical issue that soft error incurs is silent data corruption (SDC) since SDC occurs silently without any warnings to users. Estimating SDC probability of a program is the first and essential step towards designing protection mechanism. Prior work suffers from prediction inaccuracy since the proposed heuristic-based models fail to describe the semantic of fault propagation. We propose a novel approach SLOGAN which transfers the prediction of SDC probability into a graph regression task. A program is represented in the form of dynamic dependence graph. To capture the rich semantic of fault propagation, we apply structured graph attention network, which includes node-level, graph-level and layer-level self-attention. With the learned attention coefficients from node-level, graph-level, and layer-level self-attention, the importance of edges, nodes, and layers to the fault propagation can be fully considered. We generate the graph embedding by weighted aggregation of the embeddings of nodes and compute the SDC probability by the regression model. The experiment shows that SLOGAN achieves higher SDC accuracy than state-of-the-art methods with a low time cost.

References

Jha S, Cui S, and Tsai T. "Exploiting temporal data diversity for detecting safety-critical faults in AV compute systems," in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 88--100, IEEE, 2022.Google Scholar
Dixit H. D., Pendharkar S. and Beadon M.. "Silent data corruptions at scale," arXiv preprint arXiv:2102.11245, 2021.Google Scholar
Chang C K, Li G and Erez M. "Evaluating compiler ir-level selective instruction duplication with realistic hardware errors," in 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), pp. 41--49, IEEE, 2019.Google Scholar
Arasteh B, Najafi J. "Programming guidelines for improving software resiliency against soft-errors without performance overhead," Computing, vol. 100(9), pp. 971--1003, 2018.Google ScholarDigital Library
Li Z, Menon H, and Mohror K. "Understanding a program's resiliency through error propagation," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 362--373, ACM, 2021.Google Scholar
Pusz O, Christian D, and Daniel L. "Data-flow-sensitive fault-space pruning for the injection of transient hardware faults," in International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 97--109, ACM, 2021.Google Scholar
Fang B, Lu Q and Pattabiraman K. "ePVF: An enhanced program vulnerability factor methodology for cross-layer resilience analysis," in Dependable Systems and Networks (DSN), pp. 168--179, IEEE, 2016.Google Scholar
Guo L, Li D and Laguna I. "Fliptracker: Understanding natural error resilience in HPC applications," in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 94--107, IEEE, 2018.Google Scholar
Guo L, Li D and Laguna I. "Paris: Predicting application resilience using machine learning," Journal of Parallel and Distributed Computing (JPDC), no. 152, pp. 111--124, 2021.Google ScholarCross Ref
Lu Q, Pattabiraman K, and Gupta M S. "SDCTune: a model for predicting the SDC proneness of an application for configurable protection," in International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 1--10, ACM, 2014.Google Scholar
Li G, Pattabiraman K and Hari S K S. "Modeling soft-error propagation in programs," in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 27--38, IEEE, 2018.Google Scholar
Sridharan V and Kaeli D R. "Eliminating microarchitectural dependency from architectural vulnerability," in International Symposium on High Performance Computer Architecture (HPCA), pp. 117--128, IEEE, 2009.Google Scholar
Hamilton W L, Ying Z and Leskovec J. "Inductive representation learning on large graphs," in International Conference on Neural Information Processing Systems (NIPS), pp. 1025--1035, ACM, 2017.Google Scholar
Li Y, Wang S and Nguyen T N. "Improving bug detection via context-based code representation learning and attention-based neural networks," in Proceedings of the ACM on Programming Languages (OOPSLA), pp. 1--30, ACM, 2019.Google Scholar
Rahman M H, Shamji A and Guo S. "PEPPA-X: finding program test inputs to bound silent data corruption vulnerability in HPC applications," in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1--13, ACM, 2021.Google Scholar
Lattner C and Adve V. "LLVM: A compilation framework for lifelong program analysis & transformation," in International Symposium on Code Generation and Optimization (CGO), pp. 75--86, IEEE, 2004.Google Scholar
Lu Q, Farahani M and Wei J. "LLFI: An intermediate code-level fault injection tool for hardware faults, " in International Conference on Software Quality, Reliability and Security (QRS), pp. 11--16, IEEE, 2015.Google Scholar

Index Terms

SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network

Index terms have been assigned to the content through auto-classification.

Recommendations

Sampling + DMR: practical and low-overhead permanent fault detection
ISCA '11

With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting and isolating faulty cores, but the required fault detection coverage becomes ...
Read More
On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Read More
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference
January 2023
807 pages
ISBN:9781450397834
DOI:10.1145/3566097
General Chair:
Atsushi Takahashi
Tokyo Institute of Technology
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
ASPDAC '23 Paper Acceptance Rate102of328submissions,31%Overall Acceptance Rate466of1,454submissions,32%
More
Upcoming Conference
ASPDAC '25

Sponsor:

sigda

30th Asia and South Pacific Design Automation Conference

January 20 - 23, 2025

Tokyo , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 123
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network

ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sampling + DMR: practical and low-overhead permanent fault detection

On the Multichromatic Number of s-Stable Kneser Graphs

Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network

ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sampling + DMR: practical and low-overhead permanent fault detection

On the Multichromatic Number of s-Stable Kneser Graphs

Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media