Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency | IEEE Conference Publication | IEEE Xplore

Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency


Abstract:

With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a ha...Show More

Abstract:

With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.
Date of Conference: 08-11 September 2020
Date Added to IEEE Xplore: 06 September 2021
ISBN Information:

ISSN Information:

Conference Location: Las Vegas, NV, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.