Abstract
Convolutional Neural Networks (CNNs) have achieved excellent performance in image classification, being successfully applied in a wide range of domains. However, their processing power demand offers a challenge to their implementation in embedded real-time applications. To tackle this problem, we focused in this work on the FPGA acceleration of the convolutional layers, since they account for about 90% of the overall computational load. We implemented buffers to reduce the storage of feature maps and consequently, facilitating the allocation of the whole kernel weights in Block-RAMs (BRAMs). Moreover, we used 8-bits kernel weights, rounded from an already trained CNN, to further reduce the need for memory, storing them in multiple BRAMs to aid kernel loading throughput. To balance the pipeline of convolutions through the convolutional layers we manipulated the amount of parallel computation in the convolutional step in each convolutional layer. We adopted the AlexNet CNN architecture to run our experiments and compare the results. We were able to run the inference of the convolutional layers in 3.9 ms with maximum operation frequency of 76.9 MHz.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., Culurciello, E.: Hardware accelerated convolutional neural networks for synthetic vision systems. In: Proceedings of the 2010 IEEE International Symposium Circuits System, pp. 257–260 (2010)
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. Preprint (2014)
Vanhoucke, V., Senior, A., Mao, M.: Improving the speed of neural networks on CPUs. In: Proceedings of the Deep Learning Unsupervised Feature Learning Work, NIPS 2011, pp. 1–8 (2011)
Krizhevsky, A., Sutskever, I., Geoffrey, E.H.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1–9 (2012)
Peemen, M., Setio, A.A.A., Mesman, B., Corporaal, H.: Memory-centric accelerator design for convolutional neural networks. In: 2013 IEEE 31st International Conference on Computer Design (ICCD), pp. 13–19 (2013)
Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurr. Comput. Pract. Exp. 22, 685–701 (2016)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA 2015, pp. 161–170. ACM (2015)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29, 2352–2449 (2017)
Lin, D.D., Talathi, S.S., Annapureddy, V.S.: Fixed point quantization of deep convolutional networks, vol. 48 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale Image recognition. In: 2014 International Conference on Learning Representation, pp. 1–14 (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Dettmers, T.: 8-Bit approximations for parallelism in deep learning, pp. 1–14 (2015)
Courbariaux, M., Bengio, Y., David, J.-P.: Training deep neural networks with low precision multiplications, pp. 1–10 (2014)
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision, vol. 37 (2015)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks, pp. 1–9 (2015)
Savich, A.W., Moussa, M., Areibi, S.: The impact of arithmetic representation on implementing MLP-BP on FPGAs: A study. IEEE Trans. Neural Netw. 18, 240–252 (2007)
Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 G-ops/s mobile coprocessor for deep neural networks. Presented at the June 2014
Suda, N., et al.: Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA International Symposium Field-Programmable Gate Arrays - FPGA 2016, pp. 16–25 (2016)
Ma, Y., Suda, N., Cao, Y., Seo, J.S., Vrudhula, S.: Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: FPL 2016 - 26th International Conference on Field-Programmable Logic Applications (2016)
Acknowledgments
Mark Cappello Ferreira de Sousa gratefully acknowledges the National Council for Scientific and Technological Development (CNPq) for partially supporting this research. Mark also acknowledges Stelvio Henrique Ignacio Barboza, Anelise Scotti Scherer and Academic Literacy Laboratory for valuable comments. Miguel Angelo de Abreu de Sousa acknowledges the support from the Federal Institute of Education, Science and Technology of São Paulo (IFSP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
de Sousa, M.C.F., de Abreu de Sousa, M.A., Del-Moral-Hernandez, E. (2018). Balancing Convolutional Neural Networks Pipeline in FPGAs. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11139. Springer, Cham. https://doi.org/10.1007/978-3-030-01418-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-01418-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01417-9
Online ISBN: 978-3-030-01418-6
eBook Packages: Computer ScienceComputer Science (R0)