Abstract:
Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents design of an energy-effici...Show MoreMetadata
Abstract:
Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents design of an energy-efficient neural network inference engine based on adaptive weight compression using a JPEG image encoding algorithm. To maximize compression ratio with minimum accuracy loss, the quality factor of the JPEG encoder is adaptively controlled depending on the accuracy impact of each block. With 1% accuracy loss, the proposed approach achieves 63.4× compression for multilayer perceptron (MLP) and 31.3× for LeNet-5 with the MNIST dataset, and 15.3× for AlexNet and 10.2× for ResNet-50 with ImageNet. The reduced memory requirement leads to higher throughput and lower energy for neural network inference (3× effective memory bandwidth and 22× lower system energy for MLP).
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 38, Issue: 1, January 2019)