skip to main content
10.1145/2463209.2488755acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Exploiting program-level masking and error propagation for constrained reliability optimization

Published: 29 May 2013 Publication History

Abstract

Since embedded systems design involves stringent design constraints, designing a system for reliability requires optimization under tolerable overhead constraints. This paper presents a novel reliability-driven compilation scheme for software program reliability optimization under tolerable overhead constraints. Our scheme exploits program-level error masking and propagation properties to perform reliability-driven prioritization of instructions and selective protection during compilation. To enable this, we develop statistical models for estimating error masking and propagation probabilities. Our scheme provides significant improvement in reliability efficiency (avg. 30%-60%) compared to state-of-the-art program-level protection schemes.

References

[1]
J. Henkel et al., "Reliable On-Chip Systems in the Nano-Era: Lessons Learnt and Future Trends", IEEE DAC, 2013.
[2]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, T. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor", MICRO, pp. 29--40, 2003.
[3]
D. Ernst et al., "Razor: circuit-level correction of timing errors for low-power operation," IEEE MICRO, vol. 24, no. 3, pp. 10--20, 2004.
[4]
R. Vadlamani et al., "Multicore soft error rate stabilization using adaptive dual modular redundancy", IEEE DATE, pp. 27--32, 2010.
[5]
M. D. Powell, A. Biswas, S. Gupta, S. S. Mukherjee, "Architectural core salvaging in a multi-core processor for hard-error tolerance", International Symposium on Computer architecture (ISCA), pp. 93--104, 2009.
[6]
G. A. Reis, J. Chang, D. I. August, "Automatic instruction-level software only recovery", IEEE MICRO, pp. 36--47, 2007.
[7]
N. Oh et al., "Error detection by duplicated instructions in super-scalar processors", IEEE Transaction on Reliability, 51--1, pp. 63--75, 2002.
[8]
J. Hu et al., "In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability," DSN, pp. 281--290, 2006.
[9]
E. Borin, C. Wang, Y. Wu, G. Araujo, "Software-Based Transparent and Comprehensive Control-Flow Error Detection", CGO, pp. 333--345, 2006.
[10]
S. Rehman et al., "Reliable software for unreliable hardware: Embedded code generation aiming at reliability", Codess+ISSS, pp. 237--246, 2011.
[11]
D. Borodin et al., "Protected Redundancy Overhead Reduction Using Instruction Vulnerability Factor," IEEE CF, pp. 319--326, 2010.
[12]
J. Cong, K. Gururaj, "Assuring Application-Level Correctness Against Soft Errors", ICCAD, pp. 150--157,2011.
[13]
A. Sundaram et al., "Efficient fault tolerance in multi-media applications through selective instruction replication", WREFT, pp. 339--346, 2008.
[14]
M. Shafique et al., "Power-Efficient Error-Resiliency for H.264/AVC Context-Adaptive Variable Length Coding", DATE, pp. 697--702, 2012.
[15]
M. A. Makhzan et al., "A low power JPEG2000 encoder with iterative and fault tolerant error concealment", IEEE TVLSI, vol. 17, no. 6, pp. 827--837, 2009.
[16]
S. Rehman, M. Shafique, J. Henkel, "Instruction Scheduling for Reliability-Aware Compilation", IEEE DAC, pp. 1288--1296, 2012.
[17]
J. Lee et al., "Dynamic Code Duplication with Vulnerability Awareness for Soft Error Detection on VLIW Architectures", ACM TACO, vol. 9, no. 4, article 48, 2013.
[18]
M. Shafique, L. Bauer, J. Henkel, "Optimizing the H.264/AVC Video Encoder Application Structure for Reconfigurable and Application-Specific Platforms", JSPS, vol. 60, no. 2, pp. 183--210, 2010.
[19]
T. Li et al., "CSER: HW/SW Configurable Soft-Error Resiliency for Application Specific Instruction-Set Processors", IEEE DATE, 2013.
[20]
A. Rajendiran et al., "Reliable computing with ultra-reduced instruction set co-processors", IEEE DAC, pp. 697--702, 2012.
[21]
S. Rehman et al., "Leveraging Variable Function Resilience for Selective Software Reliability on Unreliable Hardware", IEEE DATE, 2013.
[22]
S. Z. Shazli, M. B. Tahoori, "Obtaining Microprocessor Vulnerability Factor Using Formal Methods", DFTVS, 2008.
[23]
Flux calculator: www.seutest.com/cgi-bin/FluxCalculator.cgi.
[24]
J. Gaisler, "A portable and fault-tolerant microprocessor based on the SPARC v8 architecture", DSN, pp. 409--415, 2002.
[25]
MiBench (http://www.eecs.umich.edu/mibench/).
[26]
IBM® XIV®: http://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp.
[27]
AMD PhenomTM II Processor Product Data Sheet 2010.
[28]
S. Ananthanarayan, S. Garg, H. D. Patel, "Low Cost Permanent Fault Detection Using Ultra-Reduced Instruction Set Co-Processors", IEEE DATE, 2013.
[29]
S. Rehman et al., "Raise: Reliability Aware Instruction SchEduling for Unreliable Hardware", IEEE ASP-DAC, pp.671--676, 2012.
[30]
T. Li et al., "RASTER: Runtime Adaptive Spatial/Temporal Error Resiliency for Embedded Processors", IEEE DAC, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '13: Proceedings of the 50th Annual Design Automation Conference
May 2013
1285 pages
ISBN:9781450320719
DOI:10.1145/2463209
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fault Tolerant ArchitecturesHandbook of Computer Architecture10.1007/978-981-97-9314-3_11(277-320)Online publication date: 21-Dec-2024
  • (2023)Fault Tolerant ArchitecturesHandbook of Computer Architecture10.1007/978-981-15-6401-7_11-1(1-44)Online publication date: 17-Feb-2023
  • (2022)Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems2022 IEEE 40th VLSI Test Symposium (VTS)10.1109/VTS52500.2021.9794253(1-14)Online publication date: 25-Apr-2022
  • (2021)Fast and Accurate SER Estimation for Large Combinational Blocks in Early Stages of the DesignIEEE Transactions on Sustainable Computing10.1109/TSUSC.2018.28866406:3(427-440)Online publication date: 1-Jul-2021
  • (2021) Emerging Computing Devices: Challenges and Opportunities for Test and Reliability * 2021 IEEE European Test Symposium (ETS)10.1109/ETS50041.2021.9465409(1-10)Online publication date: 24-May-2021
  • (2021)Evaluating Soft Error Mitigation Trade-offs During Early Design StagesArchitecture of Computing Systems10.1007/978-3-030-81682-7_14(213-228)Online publication date: 15-Jul-2021
  • (2020)Cross-layer approaches for improving the dependability of deep learning systemsProceedings of the 23th International Workshop on Software and Compilers for Embedded Systems10.1145/3378678.3391884(78-81)Online publication date: 25-May-2020
  • (2020)Fast Construction of Discrete Geodesic GraphsACM Transactions on Graphics10.1145/314456739:2(1-14)Online publication date: 18-Mar-2020
  • (2020)Dependable Deep Learning: Towards Cost-Efficient Resilience of Deep Neural Network Accelerators against Soft Errors and Permanent Faults2020 IEEE 26th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS50870.2020.9159734(1-4)Online publication date: Jul-2020
  • (2020)PB-IFMC: A Selective Soft Error Protection Method Based on Instruction Fault Masking Capability2020 25th International Computer Conference, Computer Society of Iran (CSICC)10.1109/CSICC49403.2020.9050059(1-9)Online publication date: Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media