research-article

A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs

Author:

Ning LiAuthors Info & Claims

ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence

Pages 534 - 541

https://doi.org/10.1145/3404555.3404626

Published: 20 August 2020 Publication History

Abstract

The significantly growing computation and memory demands have become a bottleneck for the application of convolutional neural networks (CNNs). Model compression is an efficient method to accelerate CNNs. However, the commonly designed architectures are not suitable for compressed models and waste large computational resources on zero operands. In this work, we propose a flexible CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high performance. Our accelerator architecture exploits different input & output parallelism for sparse computation to maximize the utilization of computing arrays. A dynamically adjustable mechanism is designed to deal with the unbalanced workload. What's more, a novel data buffering structure with slightly rearranged sequences is applied to address the challenge of access conflict. The experiments show that our accelerator can achieve 316.4 GOP/s ~ 343.5 GOP/s for VGG-16 and ResNet-50.

References

[1]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffiectual- Neuron-Free Deep Neural Network Computing. SIGARCH Comput. Archit. News 44, 3 (June 2016), 1--13. https://doi.org/10.1145/3007787.3001138

Digital Library

[2]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/ 1605.07678

[3]

Jason Cong and Bingjun Xiao. 2014. Minimizing Computation in Con- volutional Neural Networks. In Artifficial Neural Networks and Machine Learning - ICANN 2014, Stefan Wermter, Cornelius Weber, Włodzislaw Duch, Timo Honkela, Petia Koprinkova-Hristova, Sven Magg, Gün- ther Palm, and Alessandro E. P. Villa (Eds.). Springer International Publishing, Cham, 281--290.

[4]

Y. Guan, H. Liang, N. Xu, W. Wang, S. Shi, X. Chen, G. Sun, W. Zhang, and J. Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 152--159.

[5]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016. ESE: Efficient Speech Recognition En- gine with Compressed LSTM on FPGA. CoRR abs/1612.00694 (2016).

[6]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. CoRR abs/1602.01528 (2016).

[7]

Song Han, Huizi Mao, and William Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Hu man Coding.

[8]

Song Han, Je Pool, John Tran, and William Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1135--1143.

Digital Library

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Alex Krizhevsky, Geo rey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.

[12]

Alex Krizhevsky, Ilya Sutskever, and Geo rey E Hinton. 2012. Ima- geNet Classi cation with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097--1105.

Digital Library

[13]

Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. 2012. Imagenet classiffication with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[14]

Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs. In 2019 IEEE 27th Annual Interna- tional Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17--25.

[15]

XiaolongMa, FuMingGuo, WeiNiu, XueLin, J ianTang, KaishengMa, Bin Ren, and Yetang Wang. 2019. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices.

[16]

Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing Loop Operation and Data ow in FPGA Acceleration of Deep Convo- lutional Neural Networks. In Proceedings of the 2017 ACM/SIGDA In- ternational Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '17). Association for Computing Machinery, New York, NY, USA, 45--54. https://doi.org/10.1145/3020078.3021736

Digital Library

[17]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. CoRR abs/1708.04485 (2017).

[18]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, and et al. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Net- work. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '16). Association for Computing Machinery, New York, NY, USA, 26--35. https://doi.org/10.1145/2847263.2847265

[19]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Re- gion Proposal Networks. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 91- 99. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

Digital Library

[20]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[21]

NaveenSuda, VikasChandra, GaneshDasika, AbinashMohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput Optimized OpenCL-Based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '16). Association for Computing Machinery, New York, NY, USA, 16--25. https://doi.org/10.1145/2847263.2847276

[22]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. CoRR abs/1608.03665 (2016). arXiv:1608.03665 http://arxiv.org/abs/1608. 03665

[23]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Ja- son Cong. 2015. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '15). Association for Computing Machin- ery, New York, NY, USA, 161--170. https://doi.org/10.1145/2684746. 2689060

Digital Library

[24]

Jialiang Zhang and Jing Li. 2017. Improving the Performance of OpenCL-Based FPGA Accelerator for Convolutional Neural Network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, California, USA) (FPGA '17). Association for Computing Machinery, New York, NY, USA, 25--34.

Digital Library

[25]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Liu Shaoli, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. 1--12. 7783723

Index Terms

A High Energy-Efficiency Inference Accelerator Exploiting Sparse CNNs
1. Hardware
  1. Integrated circuits
2. Networks
  1. Network architectures

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Maximizing CNN Accelerator Efficiency Through Resource Partitioning
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current ...
Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks
ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Convolutional neural network (CNN) have significantly advanced image classification, video processing, and pattern recognition. Compared to other hardware deployment platforms, field programmable gate arrays (FPGAs) offer advantages such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence

April 2020

563 pages

ISBN:9781450377089

DOI:10.1145/3404555

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCAI '20

ICCAI '20: 2020 6th International Conference on Computing and Artificial Intelligence

April 23 - 26, 2020

Tianjin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
65
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten