Dynamic Reliability Management for Multi-Core Processor Based on Deep Reinforcement Learning | IEEE Conference Publication | IEEE Xplore

Dynamic Reliability Management for Multi-Core Processor Based on Deep Reinforcement Learning


Abstract:

In this paper, we propose a new dynamic reliability management (DRM) approach with deep reinforcement learning (DRL) for multi-core processors considering device reliabil...Show More

Abstract:

In this paper, we propose a new dynamic reliability management (DRM) approach with deep reinforcement learning (DRL) for multi-core processors considering device reliability effects (hard error) and transient error of signal (soft error). The proposed method is based on a recently proposed physics-based three-phase electromigration model and an exponential soft error model that considers dynamic voltage and frequency scaling (DVFS) effects. Our work has been inspired by the recent advancements in DRL for various control and game applications. Compared with the traditional Q-learning based method, DRL has better scalability, lower memory and lower computational complexity. A large class of multi-threaded applications are used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results show that the proposed method can significantly reduces memory footprint and computational time compared to the traditional Q-learning based method. Furthermore, we show that the DRL-based DRM method can save 53.50% more energy than the Q-learning based method and 61.29% more than the simple DVFS based method.
Date of Conference: 15-18 July 2019
Date Added to IEEE Xplore: 15 August 2019
ISBN Information:
Conference Location: Lausanne, Switzerland

Contact IEEE to Subscribe

References

References is not available for this document.