ABSTRACT
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.
Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05--20.5 million neurons and 0.01-15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65×-4.88× and 1.13×-1.7× improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.
- R. Parloff. The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life. http://fortune.com/ai-artificial-intelligence-deep-machine-learning/. Online. Accessed Sept. 17, 2017.Google Scholar
- C. Metz. Google, Facebook and Microsoft are remaking themselves around AI. https://wired.com/2016/11/google-facebook-microsoft-remaking-around-ai/. Online. Accessed Sept. 17, 2017.Google Scholar
- M Courbariaux et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. CoRR, abs/1511.00363, 2015. Google ScholarDigital Library
- Z Lin et al. Neural Networks with Few Multiplications. CoRR, abs, 2015.Google Scholar
- F. Li and B. Liu. Ternary Weight Networks. CoRR, abs/1605.04711, 2016.Google Scholar
- M. Courbariaux et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830, 2016.Google Scholar
- M Rastegari et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR, abs/1603.05279, 2016.Google Scholar
- S Zhou et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.Google Scholar
- H. Alemdar et al. Ternary Neural Networks for Resource-Efficient AI Applications. CoRR, abs/1609.00222, 2016.Google Scholar
- Z. Cheng et al. Training binary multilayer neural networks for image classification using expectation backpropagation. CoRR, abs/1503.03562, 2015.Google Scholar
- S Gupta et al. Deep Learning with Limited Numerical Precision. CoRR, abs/1502.02551, 2015.Google Scholar
- S. Venkataramani et al. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. ISLPED, Aug 2014. Google ScholarDigital Library
- P. Judd et al. Proteus: Exploiting numerical precision variability in deep neural networks. In Proc. ICS, June. 2016. Google ScholarDigital Library
- S. Hashemi et al. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proc. DATE, March 2017. Google ScholarDigital Library
- H. Tann et al. Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. In Proc. DAC, 2017. Google ScholarDigital Library
- A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012. Google ScholarDigital Library
- S. Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs, 2015.Google Scholar
- S. Venkataramani et al. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proc. ISCA, June, 2017. Google ScholarDigital Library
- B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. ISCA, June 2016. Google ScholarDigital Library
- S Sen et al. Sparce: Sparsity aware general purpose core extensions to accelerate deep neural networks. CoRR, abs/1711.06315, 2017.Google Scholar
- Y Jia et al. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.Google Scholar
Recommendations
Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, ...
A motion compensated lifting wavelet codec for 3D video coding
AbstractA motion compensated lifting (MCLIFT) framework for the 3D wavelet video coding is proposed in this paper. By using bi-directional motion compensation in each lifting step of the temporal direction, the video frames are effectively de-correlated. ...
Motion compensated multiresolution transmission of high definition video
Several methods to perform multiresolution transmission of HDTV through subband coding are compared. The first is simple intraframe encoding of the spatial subbands of the video. The authors have used the subband decomposition for both hierarchical ...
Comments