Abstract
Deep neural networks (DNNs) provide state-of-the-art accuracy performances in many application domains, such as computer vision and speech recognition. At the same time, DNNs require millions of expensive floating-point operations to process each input, which limit their applicability to resource-constrained systems that are limited in hardware design area or power consumption. Our goal is to devise lightweight, approximate accelerators for DNN accelerations that use less hardware resources with negligible reduction in accuracy. To simplify the hardware requirements, we analyze a spectrum of data precision methods ranging from fixed-point, dynamic fixed-point, powers-of-two to binary data precision. In conjunction, we provide new training methods to compensate for the simpler hardware resources. To boost the accuracy of the proposed lightweight accelerators, we describe ensemble processing techniques that use an ensemble of lightweight DNN accelerators to achieve the same or better accuracy than the original floating-point accelerator, while still using much less hardware resources. Using 65 nm technology libraries and industrial-strength design flow, we demonstrate a custom hardware accelerator design and training procedure which achieve low-power, low-latency while incurring insignificant accuracy degradation. We evaluate our design and technique on the CIFAR-10 and ImageNet datasets and show that significant reduction in power and inference latency is realized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems (NIPS 2014), pp 2654–2662
Bucilua C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of ACM SIGKDD
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, & Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of ACM ASPLOS. ACM, New York, pp 269–284
Courbariaux M, Bengio Y, David JP (2014) Low precision arithmetic for deep learning. arXiv:1412.7024
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Graham B (2014) Fractional max-pooling. arXiv:1412.6071
Gysel P (2016) Ristretto: hardware-oriented approximation of convolutional neural networks. CoRR, abs/1605.06402
Hashemi S, Anthony N, Tann H, Bahar RI, Reda S (2017) Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Proceedings of DATE
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Huang G, Liu Z, Weinberger KQ, van der Maaten L (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, p 3
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, New York, pp 4107–4115
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Lin Z, Courbariaux M, Memisevic R, Bengio Y (2015) Neural networks with few multiplications. CoRR, abs/1510.03009
Rastegari M, Ordonez V, Redmon J, Farhadi A (2015) XNOR-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Berlin, pp 525–542
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. CoRR, abs/1412.6550
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Shafique M, Hafiz R, Javed MU, Abbas S, Sekanina L, Vasicek Z, Mrazek V (2017) Adaptive and energy-efficient architectures for machine learning: challenges, opportunities, and research roadmap. In: 2017 IEEE computer society annual symposium on VLSI (ISVLSI). IEEE, Piscataway, pp 627–632
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Association for the advancement of artificial intelligence, vol 4, p 12
Tann H, Hashemi S, Bahar RI, Reda S (2016) Runtime configurable deep neural networks for energy-accuracy trade-off. CoRR, abs/1607.05418
Tann H, Hashemi S, Bahar RI, Reda S (2017) Hardware-software codesign of accurate, multiplier-free deep neural networks. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC). IEEE, Piscataway, pp 1–6
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146
Acknowledgements
We would like to thank Professor R. Iris Bahar and N. Anthony for their contributions to this project [8, 25]. In comparison to our two previous publications in [8, 25], we provide in this chapter additional experimental results for various quantization schemes and ensemble deployment. More specifically, the novel contributions in this chapter include implementations of accelerators capable of performing ensemble inference for fixed-point (16,16), (8,8), and power-of-two (6,16). We also provide the performance evaluations of these accelerators in side-by-side comparisons to those from our previous works in Figs. 14.7 and 14.8. We also generalize our ensemble technique to boost the accuracy to all types of quantized networks and not just dynamic fixed-point. The additional results contributed in this chapter complete the gaps between our two previous publications, which allow for a more complete design space exploration for approximate deep neural network accelerators. This work is supported by NSF grant 1420864 and by the generous GPU hardware donations from NVIDIA Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Tann, H., Hashemi, S., Reda, S. (2019). Lightweight Deep Neural Network Accelerators Using Approximate SW/HW Techniques. In: Reda, S., Shafique, M. (eds) Approximate Circuits. Springer, Cham. https://doi.org/10.1007/978-3-319-99322-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-99322-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99321-8
Online ISBN: 978-3-319-99322-5
eBook Packages: EngineeringEngineering (R0)