skip to main content
10.1145/3139315.3141787acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

System-level reliability analysis considering imperfect fault coverage

Published: 15 October 2017 Publication History

Abstract

Safety-critical systems rely on redundancy schemes such as k-out-of-n structures which enable tolerance against multiple faults. These techniques are subject to Imperfect Fault Coverage (IFC) as error detection and recovery might be prone to errors or even impossible for certain fault models. As a result, these techniques may act as single points of failure in the system where uncovered faults might be overlooked and lead to wrong system outputs. Neglecting IFC in reliability analysis may lead to fatal overestimations in case of safety-critical applications. Yet, existing techniques that do consider IFC are overly pessimistic in assuming that the occurrence of an uncovered fault always results in a system failure. But often, in particular in complex systems with nested redundant structures, a fault that is not noticed by an inner redundancy scheme might be caught by an outer redundancy scheme. This paper proposes to automatically incorporate IFC into reliability models, i. e. Binary Decision Diagrams (BDDs), to enable an accurate reliability analysis for complex system structures including nested redundancies and repeated components. It also shows that IFC does not equally affect different redundancy schemes. Experimental results presented for applications in multimedia and automotive confirm that the proposed approach can analyze system reliability more accurately at an acceptable execution time and memory overhead compared to the underlying IFC-unaware technique.

References

[1]
Hananeh Aliee, Michael Glaß, Felix Reimann, and Jürgen Teich. 2013. Automatic success tree-based reliability analysis for the consideration of transient and permanent faults. In Design, Automation and Test in Europe (DATE). 1621--1626.
[2]
Hananeh Aliee and Hamid Reza Zarandi. 2013. A fast and accurate fault tree analysis based on stochastic logic implemented on field-programmable gate arrays. IEEE Transactions on Reliability 62, 1 (2013), 13--22.
[3]
Suprasad V Amari, Joanne Bechta Dugan, and Ravindra B Misra. 1999. A separable method for incorporating imperfect fault-coverage into combinatorial models. IEEE Transactions on Reliability 48, 3 (1999), 267--274.
[4]
Václav E Beneš. 1965. Mathematical theory of connecting networks and telephone traffic. Vol. 17. Academic press.
[5]
Tobias Blickle, Jürgen Teich, and Lothar Thiele. 1998. System-level synthesis using evolutionary algorithms. Design Automation for Embedded Systems 3, 1 (1998), 23--58.
[6]
Stacy A Doyle, Joanne Bechta Dugan, and FA Patterson-Hine. 1995. A combinatorial approach to modeling imperfect coverage. IEEE Transactions on Reliability 44, 1 (1995), 87--94.
[7]
Joanne Bechta Dugan and Kishor S. Trivedi. 1989. Coverage modeling for dependability analysis of fault-tolerant systems. IEEE Trans. Comput. 38, 6 (1989), 775--787.
[8]
Michael Glaß, Martin Lukasiewycz, and Felix Reimann. 2014. Java-based reliability library. http://jreliability.sourceforge.net/
[9]
Michael Glaß, Martin Lukasiewycz, Thilo Streichert, Christian Haubelt, and Jürgen Teich. 2007. Reliability-Aware System Synthesis. In Design, Automation and Test in Europe (DATE). 409--414.
[10]
Richard W Hamming. 1950. Error detecting and error correcting codes. Bell labs Technical Journal 29, 2 (1950), 147--160.
[11]
Mohsen Jahanshahi and Fathollah Bistouni. 2015. Improving the reliability of the Benes network for use in large-scale systems. Microelectronics Reliability 55, 3 (2015), 679--695.
[12]
Faramarz Khosravi, Michael Glaß, and Jürgen Teich. 2017. Automatic Reliability Analysis in the Presence of Probabilistic Common Cause Failures. IEEE Transactions on Reliability 66, 2 (2017), 319--338.
[13]
Milos D Krstic, Mile K Stojcev, G Lj Djordjevic, and Ivan D Andrejic. 2005. A mid-value select voter. Microelectronics Reliability 45, 3 (2005), 733--738.
[14]
Aamer Mahmood and Edward J McCluskey. 1988. Concurrent error detection using watchdog processors-a survey. IEEE Trans. Comput. 37, 2 (1988), 160--174.
[15]
Ali Mosleh. 1991. Common cause failures: an analysis methodology and examples. Reliability Engineering & System Safety 34, 3 (1991), 249--292.
[16]
Albert Myers. 2007. k-out-of-n:G system reliability with imperfect fault coverage. IEEE Transactions on Reliability 56, 3 (2007), 464--473.
[17]
Albert Myers. 2008. Achievable limits on the reliability of k-out-of-n: G systems subject to imperfect fault coverage. IEEE Transactions on Reliability 57, 2 (2008), 349--354.
[18]
Albert Myers and Antoine Rauzy. 2008. Assessment of redundant systems with imperfect coverage by means of binary decision diagrams. Reliability Engineering & System Safety 93, 7 (2008), 1025--1035.
[19]
Albert Myers and Antoine Rauzy. 2008. Efficient reliability assessment of redundant systems subject to imperfect fault coverage using binary decision diagrams. IEEE Transactions on Reliability 57, 2 (2008), 336--348.
[20]
Masood Namjoo and Edward J McCluskey. 1995. Watchdog processors and capability checking. In Fault-Tolerant Computing, 1995, Highlights from Twenty-Five Years., Twenty-Fifth International Symposium on. IEEE, 94.
[21]
Nahmsuk Oh, Philip P Shirvani, and Edward J McCluskey. 2002. Control-flow checking by software signatures. IEEE transactions on Reliability 51, 1 (2002), 111--122.
[22]
Bipul C Paul, Kunhyuk Kang, Haldun Kufluoglu, Muhammad Alam, and Kaushik Roy. 2005. Impact of NBTI on the temporal performance degradation of digital circuits. IEEE Electron Device Letters 26, 8 (2005), 560--562.
[23]
Jayanth Srinivasan, Sarita V Adve, Pradip Bose, and Jude Rivers. 2004. The impact of technology scaling on lifetime reliability. In International Conference on Dependable Systems and Networks. 177--186.
[24]
Thilo Streichert, Michael Glaß, Christian Haubelt, and Jürgen Teich. 2007. Design space exploration of reliable networked embedded systems. Journal of Systems Architecture 53, 10 (2007), 751--763.
[25]
Chaonan Wang, Liudong Xing, and Gregory Levitin. 2014. Explicit and implicit methods for probabilistic common-cause failure analysis. Reliability Engineering & System Safety 131 (2014), 175--184.
[26]
Liudong Xing, Brock A Morrissette, and Joanne Bechta Dugan. 2014. Combinatorial reliability analysis of imperfect coverage systems subject to functional dependence. IEEE Transactions on Reliability 63, 1 (2014), 367--382.
[27]
Xinyu Zang, Hairong Sun, and Kishor S Trivedi. 1999. Dependability analysis of distributed computer systems with imperfect coverage. In International Symposium on Fault-Tolerant Computing. 330--337.
[28]
Qingqing Zhai, Rui Peng, Liudong Xing, and Jun Yang. 2013. Binary decision diagram-based reliability evaluation of k-out-of-(n+ k) warm standby systems subject to fault-level coverage. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 227, 5 (2013), 540--548.

Cited By

View all
  • (2021)Reliability Analysis of Systems Subject to Imperfect Fault Coverage Considering Failure Propagation and Component Relevancy2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW53611.2021.00065(210-217)Online publication date: Oct-2021
  • (2019)Reliability Analysis of Phased-Mission System in Irrelevancy Coverage Model2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS.2019.00025(89-96)Online publication date: Jul-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ESTIMedia '17: Proceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia
October 2017
102 pages
ISBN:9781450351171
DOI:10.1145/3139315
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary decision diagrams
  2. imperfect fault coverage
  3. redundancy
  4. reliability

Qualifiers

  • Research-article

Conference

ESWEEK'17
ESWEEK'17: THIRTEENTH EMBEDDED SYSTEM WEEK
October 15 - 20, 2017
Seoul, Republic of Korea

Acceptance Rates

ESTIMedia '17 Paper Acceptance Rate 6 of 14 submissions, 43%;
Overall Acceptance Rate 15 of 39 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Reliability Analysis of Systems Subject to Imperfect Fault Coverage Considering Failure Propagation and Component Relevancy2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW53611.2021.00065(210-217)Online publication date: Oct-2021
  • (2019)Reliability Analysis of Phased-Mission System in Irrelevancy Coverage Model2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS.2019.00025(89-96)Online publication date: Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media