Abstract
The success of Convolution Neural Network (CNN) in computer vision presents a continuing challenge on performance requirement in both training and inference processes. Various software optimization has been examined towards existing hardware devices such as CPU and GPU to meet the computation needs; however, the performance gap between ideal and reality will keep going if there is short of hardware support. In this paper, we propose a customized CNN processor by extending the RISC-V instruction set. We have added six primary instructions by analyzing and abstracting the characteristics of conventional CNN models. The target micro-architecture has been upgraded accordingly to exploit the parallelism in the massive data access. We evaluated our work on the broadly used CNN model, LeNet-5, on Field Programmable Gate Arrays (FPGA) for the correctness validation. Comparing to traditional x86 and MIPS ISAs, our design provides a higher code density and performance efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abtahi, T., Kulkarni, A., Mohsenin, T.: Accelerating convolutional neural network with FFT on tiny cores, pp. 1–4 (May 2017). https://doi.org/10.1109/ISCAS.2017.8050588
Chen, T., et al.: TVM: end-to-end optimization stack for deep learning. CoRR abs/1802.04799 (2018)
Chitsaz, K., Hajabdollahi, M., Karimi, N., Samavi, S., Shirani, S.: Acceleration of convolutional neural network using FFT-based split convolutions. CoRR abs/2003.12621 (2020)
Flamand, E., et al.: GAP-8: a RISC-V SoC for AI at the Edge of the IoT. In: 2018 IEEE 29th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 1–4 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)1Â mb model size. CoRR abs/1602.07360 (2016)
Kala, S., Jose, B.R., Mathew, J., Nalesh, S.: High-performance CNN accelerator on FPGA using unified Winograd-GEMM architecture. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(12), 2816–2828 (2019)
Kala, S., Mathew, J., Jose, B.R., Nalesh, S.: UniWiG: unified Winograd-GEMM architecture for accelerating CNN on FPGAs. In: 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), pp. 209–214 (2019)
Kim, H., Nam, H., Jung, W., Lee, J.: Performance analysis of CNN frameworks for GPUs. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 55–64 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, Y.: An agile approach to building RISC-V microprocessors. IEEE Micro 36(2), 8–20 (2016)
Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 633–644 (2016)
Li, X., Liang, Y., Yan, S., Jia, L., Li, Y.: A coordinated tiling and batching framework for efficient GEMM on GPUs. In: PPoPP, pp. 229–241. ACM (2019)
Li, Z., Hu, W., Chen, S.: Design and implementation of CNN custom processor based on RISC-V architecture. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1945–1950 (2019)
Lou, W., Wang, C., Gong, L., Zhou, X.: RV-CNN: flexible and efficient instruction set for CNNs based on RISC-V processors. In: Yew, P.-C., Stenström, P., Wu, J., Gong, X., Li, T. (eds.) APPT 2019. LNCS, vol. 11719, pp. 3–14. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29611-7_1
Luo, J., Zhang, H., Zhou, H., Xie, C., Wu, J., Lin, W.: ThiNet: pruning CNN filters for a thinner net. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2525–2538 (2019)
Moreau, T., Chen, T., Jiang, Z., Ceze, L., Guestrin, C., Krishnamurthy, A.: VTA: an open hardware-software stack for deep learning. CoRR abs/1807.04188 (2018)
Porter, R., Morgan, S., Biglari-Abhari, M.: Extending a soft-core RISC-V processor to accelerate CNN inference. In: 2019 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 694–697 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 317–324 (2010)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9. IEEE Computer Society (2015)
Vasudevan, A., Anderson, A., Gregg, D.: Parallel multi-channel convolution using general matrix multiplication. In: 2017 IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 19–24 (2017)
Waterman, A., Lee, Y., Avizienis, R., Cook, H., Patterson, D.A., Asanovic, K.: The RISC-V instruction set. In: Hot Chips Symposium, p. 1. IEEE (2013)
Yu, J., et al.: Instruction driven cross-layer CNN accelerator with Winograd transformation on FPGA. In: 2017 International Conference on Field Programmable Technology (ICFPT), pp. 227–230 (2017)
Acknowledgement
This work was supported by Science Foundation Ireland grant 13/RC/2094 to Lero - The Irish Software Research Centre.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiao, Q., Hu, W., Wen, Y., Dong, Y., Li, Z., Gan, Y. (2020). Design of a Convolutional Neural Network Instruction Set Based on RISC-V and Its Microarchitecture Implementation. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-60239-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)