Skip to main content

RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors

  • Conference paper
  • First Online:
Advanced Parallel Processing Technologies (APPT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Included in the following conference series:

Abstract

Convolutional Neural Network (CNN) has gained significant attention in the field of machine learning, particularly due to its high accuracy in character recognition and image classification. Nevertheless, due to the computation-intensive and memory-intensive character of CNN, general-purpose processors which usually need to support various workloads are not efficient for CNN implementation. Therefore, a great deal of emerging CNN-specific hardware accelerators is able to improve efficiency. Although existing accelerators are significantly efficient, they are often inflexible or require complex controllers to handle calculations and data transfer. In this paper, we analyze classical CNN applications and design a domain-specific instruction set of 9 matrix instructions, called RV-CNN, based on the promising RISC-V architecture. By abstracting CNN into instructions, our design possesses a higher code density and provides sufficient flexibility and efficiency for CNN than general-purpose ISAs. Specifically, the proposed instructions are extended to RISC-V ISA as custom instructions. Besides, we also introduce micro-architectural optimizations to increase computational density and reduce the required memory bandwidth. Finally, we implement the architecture with the extended ISA and evaluate it with LeNet-5 on the datasets (MNIST, Caltech101, and Cifar-10). Results show that compared with the Intel Core i7 processor and Tesla k40c GPU, our design has 36.09x and 11.42x energy efficiency ratio and 6.70x and 1.25x code density respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banakar, R., Steinke, S., Lee, B.S., Balakrishnan, M., Marwedel, P.: Scratchpad memory: a design alternative for cache on-chip memory in embedded systems. In: International Symposium on Hardware/software Codesign (2002)

    Google Scholar 

  2. Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM SIGPLAN Notices, vol. 49, pp. 269–284. ACM (2014)

    Google Scholar 

  3. Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., et al.: DaDianNao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE Computer Society (2014)

    Google Scholar 

  4. Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., et al. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11179-7_36

    Chapter  Google Scholar 

  5. Conti, F., Rossi, D., Pullini, A., Loi, I., Benini, L.: PULP: a ultra-low power parallel accelerator for energy-efficient and flexible embedded vision. J. Signal Process. Syst. 84(3), 339–354 (2016)

    Article  Google Scholar 

  6. Flamand, E., et al.: GAP-8: a RISC-V SoC for AI at the edge of the IoT. In: 2018 IEEE 29th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 1–4. IEEE (2018)

    Google Scholar 

  7. Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 G-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 682–687 (2014)

    Google Scholar 

  8. Gong, L., Wang, C., Li, X., Chen, H., Zhou, X.: MALOC: a fully pipelined fpga accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37(11), 2601–2612 (2018)

    Article  Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  10. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  11. Liu, S., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 393–405. IEEE Press (2016)

    Google Scholar 

  12. Moini, S., Alizadeh, B., Ebrahimpour, R.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans. Circuits Syst. II: Express Briefs 64(10), 1217–1221 (2017)

    Article  Google Scholar 

  13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  14. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)

    Google Scholar 

  15. Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., Zhou, X.: DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2016)

    Google Scholar 

  16. Wang, C., Li, X., Chen, Y., Zhang, Y., Diessel, O., Zhou, X.: Service-oriented architecture on FPGA-based MPSoC. IEEE Trans. Parallel Distrib. Syst. 28(10), 2993–3006 (2017)

    Article  Google Scholar 

  17. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFA0700900), National Science Foundation of China (No. 61772482), Jiangsu Provincial Natural Science Foundation (No. BK20181193), Youth Innovation Promotion Association CAS (No. 2017497), and Fundamental Research Funds for the Central Universities (WK2150110003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lou, W., Wang, C., Gong, L., Zhou, X. (2019). RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29611-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29610-0

  • Online ISBN: 978-3-030-29611-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics