skip to main content
10.1145/3404555.3404626acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs

Published: 20 August 2020 Publication History

Abstract

The significantly growing computation and memory demands have become a bottleneck for the application of convolutional neural networks (CNNs). Model compression is an efficient method to accelerate CNNs. However, the commonly designed architectures are not suitable for compressed models and waste large computational resources on zero operands. In this work, we propose a flexible CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high performance. Our accelerator architecture exploits different input & output parallelism for sparse computation to maximize the utilization of computing arrays. A dynamically adjustable mechanism is designed to deal with the unbalanced workload. What's more, a novel data buffering structure with slightly rearranged sequences is applied to address the challenge of access conflict. The experiments show that our accelerator can achieve 316.4 GOP/s ~ 343.5 GOP/s for VGG-16 and ResNet-50.

References

[1]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffiectual- Neuron-Free Deep Neural Network Computing. SIGARCH Comput. Archit. News 44, 3 (June 2016), 1--13. https://doi.org/10.1145/3007787.3001138
[2]
Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/ 1605.07678
[3]
Jason Cong and Bingjun Xiao. 2014. Minimizing Computation in Con- volutional Neural Networks. In Artifficial Neural Networks and Machine Learning - ICANN 2014, Stefan Wermter, Cornelius Weber, Włodzislaw Duch, Timo Honkela, Petia Koprinkova-Hristova, Sven Magg, Gün- ther Palm, and Alessandro E. P. Villa (Eds.). Springer International Publishing, Cham, 281--290.
[4]
Y. Guan, H. Liang, N. Xu, W. Wang, S. Shi, X. Chen, G. Sun, W. Zhang, and J. Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 152--159.
[5]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016. ESE: Efficient Speech Recognition En- gine with Compressed LSTM on FPGA. CoRR abs/1612.00694 (2016).
[6]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. CoRR abs/1602.01528 (2016).
[7]
Song Han, Huizi Mao, and William Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Hu man Coding.
[8]
Song Han, Je Pool, John Tran, and William Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1135--1143.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[11]
Alex Krizhevsky, Geo rey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[12]
Alex Krizhevsky, Ilya Sutskever, and Geo rey E Hinton. 2012. Ima- geNet Classi cation with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097--1105.
[13]
Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. 2012. Imagenet classiffication with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[14]
Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs. In 2019 IEEE 27th Annual Interna- tional Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17--25.
[15]
XiaolongMa, FuMingGuo, WeiNiu, XueLin, J ianTang, KaishengMa, Bin Ren, and Yetang Wang. 2019. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices.
[16]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing Loop Operation and Data ow in FPGA Acceleration of Deep Convo- lutional Neural Networks. In Proceedings of the 2017 ACM/SIGDA In- ternational Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '17). Association for Computing Machinery, New York, NY, USA, 45--54. https://doi.org/10.1145/3020078.3021736
[17]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. CoRR abs/1708.04485 (2017).
[18]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, and et al. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Net- work. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '16). Association for Computing Machinery, New York, NY, USA, 26--35. https://doi.org/10.1145/2847263.2847265
[19]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Re- gion Proposal Networks. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91- 99. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
[20]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[21]
NaveenSuda, VikasChandra, GaneshDasika, AbinashMohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput Optimized OpenCL-Based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '16). Association for Computing Machinery, New York, NY, USA, 16--25. https://doi.org/10.1145/2847263.2847276
[22]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. CoRR abs/1608.03665 (2016). arXiv:1608.03665 http://arxiv.org/abs/1608. 03665
[23]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Ja- son Cong. 2015. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '15). Association for Computing Machin- ery, New York, NY, USA, 161--170. https://doi.org/10.1145/2684746. 2689060
[24]
Jialiang Zhang and Jing Li. 2017. Improving the Performance of OpenCL-Based FPGA Accelerator for Convolutional Neural Network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '17). Association for Computing Machinery, New York, NY, USA, 25--34.
[25]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Liu Shaoli, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. 1--12. 7783723

Index Terms

  1. A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence
      April 2020
      563 pages
      ISBN:9781450377089
      DOI:10.1145/3404555
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      In-Cooperation

      • University of Tsukuba: University of Tsukuba

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 August 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Convolutional neural network
      2. accelerator
      3. pattern pruning
      4. reconfigurable

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICCAI '20

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 65
        Total Downloads
      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media