ABSTRACT
Software-based fault-tolerance mechanisms can increase the reliability of multi-core CPUs while being cheaper and more flexible than hardware solutions like lockstep architectures. However, checkpoint creation, error detection and correction entail high performance overhead if implemented in software. We propose a software/hardware hybrid approach, which leverages Intel's hardware transactional memory (TSX) to support implicit checkpoint creation and fast rollback. Hardware enhancements are proposed and evaluated, leading to a resulting performance overhead of 19% on average.
- F. Haas, S. Weis, S. Metzlaff, and T. Ungerer. Exploiting Intel TSX for Fault-Tolerant Execution in Safety-Critical Systems. In Proceedings of the International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pages 197--202, 2014.Google ScholarCross Ref
- P. Hammarlund, A. J. Martinez, A. A. Bajwa, D. L. Hill, E. Hallnor, et al. Haswell: The Fourth-Generation Intel Core Processor. IEEE Micro, 34(2):6--20, 2014.Google ScholarCross Ref
- A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors. PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures. IEEE Transactions on Dependable and Secure Computing (TDSC), 6(2):135--148, 2009. Google ScholarDigital Library
- J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe. Reunion: Complexity-Effective Multicore Redundancy. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 223--234, 2006. Google ScholarDigital Library
Index Terms
- POSTER: Fault-tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support
Recommendations
Hybrid STM/HTM for nested transactions on OpenJDK
OOPSLA '16Transactional memory (TM) has long been advocated as a promising pathway to more automated concurrency control for scaling concurrent programs running on parallel hardware. Software TM (STM) has the benefit of being able to run general transactional ...
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
Current hardware transactional memory systems seek to simplify parallel programming, but assume that large transactions are rare, so it is acceptable to penalize their performance or concurrency. However, future programmers may wish to use large ...
Hybrid STM/HTM for nested transactions on OpenJDK
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsTransactional memory (TM) has long been advocated as a promising pathway to more automated concurrency control for scaling concurrent programs running on parallel hardware. Software TM (STM) has the benefit of being able to run general transactional ...
Comments