skip to main content
10.1145/3195970.3196012acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors

Published:24 June 2018Publication History

ABSTRACT

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.

Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05--20.5 million neurons and 0.01-15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65×-4.88× and 1.13×-1.7× improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.

References

  1. R. Parloff. The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life. http://fortune.com/ai-artificial-intelligence-deep-machine-learning/. Online. Accessed Sept. 17, 2017.Google ScholarGoogle Scholar
  2. C. Metz. Google, Facebook and Microsoft are remaking themselves around AI. https://wired.com/2016/11/google-facebook-microsoft-remaking-around-ai/. Online. Accessed Sept. 17, 2017.Google ScholarGoogle Scholar
  3. M Courbariaux et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. CoRR, abs/1511.00363, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z Lin et al. Neural Networks with Few Multiplications. CoRR, abs, 2015.Google ScholarGoogle Scholar
  5. F. Li and B. Liu. Ternary Weight Networks. CoRR, abs/1605.04711, 2016.Google ScholarGoogle Scholar
  6. M. Courbariaux et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830, 2016.Google ScholarGoogle Scholar
  7. M Rastegari et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR, abs/1603.05279, 2016.Google ScholarGoogle Scholar
  8. S Zhou et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.Google ScholarGoogle Scholar
  9. H. Alemdar et al. Ternary Neural Networks for Resource-Efficient AI Applications. CoRR, abs/1609.00222, 2016.Google ScholarGoogle Scholar
  10. Z. Cheng et al. Training binary multilayer neural networks for image classification using expectation backpropagation. CoRR, abs/1503.03562, 2015.Google ScholarGoogle Scholar
  11. S Gupta et al. Deep Learning with Limited Numerical Precision. CoRR, abs/1502.02551, 2015.Google ScholarGoogle Scholar
  12. S. Venkataramani et al. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. ISLPED, Aug 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Judd et al. Proteus: Exploiting numerical precision variability in deep neural networks. In Proc. ICS, June. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Hashemi et al. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proc. DATE, March 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Tann et al. Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. In Proc. DAC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs, 2015.Google ScholarGoogle Scholar
  18. S. Venkataramani et al. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proc. ISCA, June, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. ISCA, June 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S Sen et al. Sparce: Sparsity aware general purpose core extensions to accelerate deep neural networks. CoRR, abs/1711.06315, 2017.Google ScholarGoogle Scholar
  21. Y Jia et al. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '18: Proceedings of the 55th Annual Design Automation Conference
    June 2018
    1089 pages
    ISBN:9781450357005
    DOI:10.1145/3195970

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader