skip to main content
10.1145/3566097.3567910acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network

Authors Info & Claims
Published:31 January 2023Publication History

ABSTRACT

The trend of progressive technology scaling makes the computing system more susceptible to soft errors. The most critical issue that soft error incurs is silent data corruption (SDC) since SDC occurs silently without any warnings to users. Estimating SDC probability of a program is the first and essential step towards designing protection mechanism. Prior work suffers from prediction inaccuracy since the proposed heuristic-based models fail to describe the semantic of fault propagation. We propose a novel approach SLOGAN which transfers the prediction of SDC probability into a graph regression task. A program is represented in the form of dynamic dependence graph. To capture the rich semantic of fault propagation, we apply structured graph attention network, which includes node-level, graph-level and layer-level self-attention. With the learned attention coefficients from node-level, graph-level, and layer-level self-attention, the importance of edges, nodes, and layers to the fault propagation can be fully considered. We generate the graph embedding by weighted aggregation of the embeddings of nodes and compute the SDC probability by the regression model. The experiment shows that SLOGAN achieves higher SDC accuracy than state-of-the-art methods with a low time cost.

References

  1. Jha S, Cui S, and Tsai T. "Exploiting temporal data diversity for detecting safety-critical faults in AV compute systems," in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 88--100, IEEE, 2022.Google ScholarGoogle Scholar
  2. Dixit H. D., Pendharkar S. and Beadon M.. "Silent data corruptions at scale," arXiv preprint arXiv:2102.11245, 2021.Google ScholarGoogle Scholar
  3. Chang C K, Li G and Erez M. "Evaluating compiler ir-level selective instruction duplication with realistic hardware errors," in 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), pp. 41--49, IEEE, 2019.Google ScholarGoogle Scholar
  4. Arasteh B, Najafi J. "Programming guidelines for improving software resiliency against soft-errors without performance overhead," Computing, vol. 100(9), pp. 971--1003, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Li Z, Menon H, and Mohror K. "Understanding a program's resiliency through error propagation," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 362--373, ACM, 2021.Google ScholarGoogle Scholar
  6. Pusz O, Christian D, and Daniel L. "Data-flow-sensitive fault-space pruning for the injection of transient hardware faults," in International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 97--109, ACM, 2021.Google ScholarGoogle Scholar
  7. Fang B, Lu Q and Pattabiraman K. "ePVF: An enhanced program vulnerability factor methodology for cross-layer resilience analysis," in Dependable Systems and Networks (DSN), pp. 168--179, IEEE, 2016.Google ScholarGoogle Scholar
  8. Guo L, Li D and Laguna I. "Fliptracker: Understanding natural error resilience in HPC applications," in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 94--107, IEEE, 2018.Google ScholarGoogle Scholar
  9. Guo L, Li D and Laguna I. "Paris: Predicting application resilience using machine learning," Journal of Parallel and Distributed Computing (JPDC), no. 152, pp. 111--124, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  10. Lu Q, Pattabiraman K, and Gupta M S. "SDCTune: a model for predicting the SDC proneness of an application for configurable protection," in International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 1--10, ACM, 2014.Google ScholarGoogle Scholar
  11. Li G, Pattabiraman K and Hari S K S. "Modeling soft-error propagation in programs," in IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 27--38, IEEE, 2018.Google ScholarGoogle Scholar
  12. Sridharan V and Kaeli D R. "Eliminating microarchitectural dependency from architectural vulnerability," in International Symposium on High Performance Computer Architecture (HPCA), pp. 117--128, IEEE, 2009.Google ScholarGoogle Scholar
  13. Hamilton W L, Ying Z and Leskovec J. "Inductive representation learning on large graphs," in International Conference on Neural Information Processing Systems (NIPS), pp. 1025--1035, ACM, 2017.Google ScholarGoogle Scholar
  14. Li Y, Wang S and Nguyen T N. "Improving bug detection via context-based code representation learning and attention-based neural networks," in Proceedings of the ACM on Programming Languages (OOPSLA), pp. 1--30, ACM, 2019.Google ScholarGoogle Scholar
  15. Rahman M H, Shamji A and Guo S. "PEPPA-X: finding program test inputs to bound silent data corruption vulnerability in HPC applications," in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1--13, ACM, 2021.Google ScholarGoogle Scholar
  16. Lattner C and Adve V. "LLVM: A compilation framework for lifelong program analysis & transformation," in International Symposium on Code Generation and Optimization (CGO), pp. 75--86, IEEE, 2004.Google ScholarGoogle Scholar
  17. Lu Q, Farahani M and Wei J. "LLFI: An intermediate code-level fault injection tool for hardware faults, " in International Conference on Software Quality, Reliability and Security (QRS), pp. 11--16, IEEE, 2015.Google ScholarGoogle Scholar

Index Terms

  1. SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference
          January 2023
          807 pages
          ISBN:9781450397834
          DOI:10.1145/3566097

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 January 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASPDAC '23 Paper Acceptance Rate102of328submissions,31%Overall Acceptance Rate466of1,454submissions,32%

          Upcoming Conference

          ASPDAC '25
        • Article Metrics

          • Downloads (Last 12 months)64
          • Downloads (Last 6 weeks)8

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader