Abstract
Scalable Software Support for Dependable Embedded Systems (S\(^{3}\)DES) achieves fault tolerance by utilizing spatial software-based triple modular redundancy for computational and voter processes on application level. Due to the parallel execution of the replicas on distinct CPU cores it makes a step towards software-based fault tolerance against transient and permanent random hardware errors. Additionally, the compliance with real-time requirements in terms of response time is enhanced compared to similar approaches. The replicated voters, the introduced mutual voter monitoring and the optimized arithmetic encoding allow the detection and compensation of voter failures without the utilization of backward recovery. Fault injection experiments on real hardware reveal that S\(^{3}\)DES can detect and mask all injected data and program flow errors under a single fault assumption, whereas an uncoded voting scheme yields approx. 12% silent data corruptions in a similar experiment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We selected the Aurix TriBoard platform to allow comparability with other approaches which used similar hardware. In the future S\(^{3}\)DES will be evaluated on a more suitable and representative ARM system such as the Renesas R-Car series which executes a POSIX compliant operating system and lacks already integrated hardware safety mechanisms. The R-Car platform provides two Quad-Core CPUs and consequently states reduced requirements regarding resource consumption in terms of core usage.
References
Arlat, J., et al.: Fault injection for dependability validation: a methodology and some applications. IEEE Trans. Softw. Eng. 16(2), 166–182 (1990)
Bartlett, W., Spainhower, L.: Commercial fault tolerance: a tale of two systems. IEEE Trans. Dependable Secure Comput. 1(1), 87–96 (2004). https://doi.org/10.1109/TDSC.2004.4
Baumann, R.: Soft errors in advanced computer systems. IEEE Des. Test Comput. 22(3), 258–266 (2005). https://doi.org/10.1109/MDT.2005.69
Borkar, S.: Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6), 10–16 (2005). https://doi.org/10.1109/MM.2005.110
Braun, J., Mottok, J.: Fail-safe and fail-operational systems safeguarded with coded processing. In: Eurocon 2013, pp. 1878–1885, July 2013. https://doi.org/10.1109/EUROCON.2013.6625234
Braun, J., Mottok, J.: The myths of coded processing. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 1637–1644, August 2015. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.24
Echtle, K.: Fehlertoleranzverfahren (1990)
Goloubeva, O., Rebaudengo, M., Reorda, M.S., Violante, M.: Software Implemented Hardware Fault Tolerance, vol. 2005. Springer, New York (2006)
Hsueh, M.C., Tsai, T.K., Iyer, R.K.: Fault injection techniques and tools. Computer 30(4), 75–82 (1997). https://doi.org/10.1109/2.585157
Kim, Y., et al.: Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014. https://doi.org/10.1109/ISCA.2014.6853210
Koser, E., Berthold, K., Pujari, R.K., Stechele, W.: A chip-level redundant threading (CRT) scheme for shared-memory protection. In: 2016 International Conference on High Performance Computing Simulation (HPCS), pp. 116–124, July 2016. https://doi.org/10.1109/HPCSim.2016.7568324
Li, M.L., Ramachandran, P., Sahoo, S.K., Adve, S.V., Adve, V.S., Zhou, Y.: Understanding the propagation of hard errors to software and implications for resilient system design. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XIII, pp. 265–276. ACM, New York (2008). https://doi.org/10.1145/1346281.1346315
Maia, R., Henriques, L., Costa, D., Madeira, H.: XceptionTM - enhanced automated fault-injection environment. In: Proceedings International Conference on Dependable Systems and Networks, p. 547 (2002). https://doi.org/10.1109/DSN.2002.1028978
Narayanan, V., Xie, Y.: Reliability concerns in embedded system designs. Computer 39(1), 118–120 (2006). https://doi.org/10.1109/MC.2006.31
Nightingale, E.B., Douceur, J.R., Orgovan, V.: Cycles, cells and platters: an empirical analysis of hardware failures on a million consumer PCs. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys 2011, pp. 343–356. ACM, New York (2011). https://doi.org/10.1145/1966445.1966477
Osinski, L., Langer, T., Schmid, M., Mottok, J.: PyFI-fault injection platform for real hardware. In: ARCS Workshop 2018; 31st International Conference on Architecture of Computing Systems, pp. 1–7. VDE (2018)
Reis, G.A., Chang, J., August, D.I.: Automatic Instruction-Level Software-Only Recovery, pp. 36–47 (2007)
Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I.: SWIFT: software implemented fault tolerance. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2005, pp. 243–254. IEEE Computer Society, Washington, DC (2005). https://doi.org/10.1109/CGO.2005.34
Saggese, G.P., Wang, N.J., Kalbarczyk, Z.T., Patel, S.J., Iyer, R.K.: An experimental study of soft errors in microprocessors. IEEE Micro 25(6), 30–39 (2005). https://doi.org/10.1109/MM.2005.104
Schiffel, U.: Hardware error detection using AN-codes (2010)
Schroeder, B., Pinheiro, E., Weber, W.D.: DRAM errors in the wild: a large-scale field study. In: ACM SIGMETRICS Performance Evaluation Review, vol. 37, pp. 193–204. ACM (2009)
Shye, A., Moseley, T., Reddi, V.J., Blomstedt, J., Connors, D.A.: Using process-level redundancy to exploit multiple cores for transient fault tolerance. In: 2007 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2007, pp. 297–306. IEEE (2007)
Stott, D.T., Floering, B., Burke, D., Kalbarczpk, Z., Iyer, R.K.: NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors. In: Proceedings IEEE International Computer Performance and Dependability Symposium, IPDS 2000. pp. 91–100 (2000). https://doi.org/10.1109/IPDS.2000.839467
Ulbrich, P.: Ganzheitliche Fehlertoleranz in Eingebetteten Softwaresystemen. Ph.D. thesis (2014)
Wappler, U., Fetzer, C.: Hardware failure virtualization via software encoded processing. In: 2007 5th IEEE International Conference on Industrial Informatics, vol. 2, pp. 977–982, June 2007. https://doi.org/10.1109/INDIN.2007.4384907
Wappler, U., Muller, M.: Software protection mechanisms for dependable systems. In: 2008 Design, Automation and Test in Europe, pp. 947–952, March 2008. https://doi.org/10.1109/DATE.2008.4484802
Ziade, H., Ayoubi, R., Velazco, R.: A survey on fault injection techniques. Int. Arab J. Inf. Technol. 1(2), 171–186 (2004). https://doi.org/10.1.1.167.966
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Osinski, L., Mottok, J. (2019). S\(^{3}\)DES - Scalable Software Support for Dependable Embedded Systems. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-18656-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18655-5
Online ISBN: 978-3-030-18656-2
eBook Packages: Computer ScienceComputer Science (R0)