Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators

Guan, Yijin; Xu, Ningyi; Zhang, Chen; Yuan, Zhihang; Cong, Jason

doi:10.1007/978-3-319-67952-5_2

Yijin Guan¹⁹,
Ningyi Xu²⁰,
Chen Zhang¹⁹,
Zhihang Yuan¹⁹ &
…
Jason Cong^19,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10561))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

1346 Accesses
3 Altmetric

Abstract

Convolutional Neural Network (CNN) has been extensively employed in research fields including multimedia recognition, computer version, etc. Various FPGA-based accelerators for deep CNN have been proposed to achieve high energy-efficiency. For some FPGA-based CNN accelerators in embedded systems, such as UAVs, IoT, and wearable devices, their overall performance is greatly bounded by the limited data bandwidth to the on-board DRAM. In this paper, we argue that it is feasible to overcome the bandwidth bottleneck using data compression techniques. We propose an effective roofline model to explore design trade-off between computation logic and data bandwidth after applying data compression techniques to parameters of CNNs. We implement a decompression module and a CNN accelerator on a single Xilinx VC707 FPGA board with two different compression/decompression algorithms as case studies. Under a scenario with limited data bandwidth, the peak performance of our implementation can outperform designs using previous methods by $3.2{\times }$ in overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hardware/Software Co-design for Convolutional Neural Networks Acceleration: A Survey and Open Issues

Deploying deep learning networks based advanced techniques for image processing on FPGA platform

Article 15 June 2023

An Efficient FPGA Accelerator Design for Optimized CNNs Using OpenCL

References

Cadambi, S., Majumdar, A., Becchi, M., Chakradhar, S., Graf, H.P.: A programmable parallel accelerator for learning and classification. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 273–284. ACM (2010)
Google Scholar
Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1337–1345 (2013)
Google Scholar
Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). doi:10.1007/978-3-319-11179-7_36
Google Scholar
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Google Scholar
Farabet, C., Poulet, C., Han, J.Y., LeCun, Y.: CNP: an FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 32–37. IEEE (2009)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM (2007)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Peemen, M., Setio, A., Mesman, B., Corporaal, H., et al.: Memory-centric accelerator design for convolutional neural networks. In: IEEE 31st International Conference on Computer Design (ICCD), pp. 13–19. IEEE (2013)
Google Scholar
Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., Graf, H.P.: A massively parallel coprocessor for convolutional neural networks. In: 20th IEEE International Conference onApplication-specific Systems, Architectures and Processors, ASAP 2009, pp. 53–60. IEEE (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Yadan, O., Adams, K., Taigman, Y., Ranzato, M.: Multi-GPU training of convnets. arXiv preprint arXiv:1312.5853, p. 17 (2013)
Yu, K.: Large-scale deep learning at Baidu. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 2211–2212. ACM (2013)
Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Energy-Efficient Computing and Applications, PKU, Beijing, China
Yijin Guan, Chen Zhang, Zhihang Yuan & Jason Cong
Microsoft Research Asia, Beijing, China
Ningyi Xu
Computer Science Department, University of California, Los Angeles, USA
Jason Cong

Authors

Yijin Guan
View author publications
You can also search for this author in PubMed Google Scholar
Ningyi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Jason Cong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yijin Guan .

Editor information

Editors and Affiliations

National University of Defense Technology, Changsha, China
Yong Dou
Delft University of Technology, Delft, The Netherlands
Haixiang Lin
Peking University, Beijing, China
Guangyu Sun
National University of Defense Technology, Changsha, China
Junjie Wu
CiTIUS, Santiago de Compostela, Spain
Dora Heras
ENS Rennes, Rennes, France
Luc Bougé

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, Y., Xu, N., Zhang, C., Yuan, Z., Cong, J. (2017). Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators. In: Dou, Y., Lin, H., Sun, G., Wu, J., Heras, D., Bougé, L. (eds) Advanced Parallel Processing Technologies. APPT 2017. Lecture Notes in Computer Science(), vol 10561. Springer, Cham. https://doi.org/10.1007/978-3-319-67952-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-67952-5_2
Published: 14 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67951-8
Online ISBN: 978-3-319-67952-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)