Loading [MathJax]/extensions/TeX/mhchem.js
Design and Analysis of a Neural Network Inference Engine Based on Adaptive Weight Compression | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Monday, 27 January, the IEEE Xplore Author Profile management portal will undergo scheduled maintenance from 9:00-11:00 AM ET (1400-1600 UTC). During this time, access to the portal will be unavailable. We apologize for any inconvenience.

Design and Analysis of a Neural Network Inference Engine Based on Adaptive Weight Compression


Abstract:

Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents design of an energy-effici...Show More

Abstract:

Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents design of an energy-efficient neural network inference engine based on adaptive weight compression using a JPEG image encoding algorithm. To maximize compression ratio with minimum accuracy loss, the quality factor of the JPEG encoder is adaptively controlled depending on the accuracy impact of each block. With 1% accuracy loss, the proposed approach achieves 63.4× compression for multilayer perceptron (MLP) and 31.3× for LeNet-5 with the MNIST dataset, and 15.3× for AlexNet and 10.2× for ResNet-50 with ImageNet. The reduced memory requirement leads to higher throughput and lower energy for neural network inference (3× effective memory bandwidth and 22× lower system energy for MLP).
Page(s): 109 - 121
Date of Publication: 02 February 2018

ISSN Information:

Funding Agency:


References

References is not available for this document.