skip to main content
10.1145/3649153.3649181acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
extended-abstract

RV-GEMM: Neural Network Inference Acceleration with Near-Memory GEMM Instructions on RISC-V

Published: 02 July 2024 Publication History

Abstract

General Matrix Multiply (GEMM), as a fundamental operation in neural network, plays an important role in artificial intelligence and signal processing applications. In this paper, we proposed three SMID RISC-V custom instructions to accelerate GEMM computations, supporting multiple precisions including 32-bit, 16-bit and 8-bit fixed. Furthermore, we implemented address calculation and loop control units along with the GEMM acceleration module to reduce the memory access overhead. These three GEMM custom instructions, along with the near-memory optimization units, were incorporated in the RV-GEMM processor and implemented on the FPGA platform for speedup evaluation. It was also compiled in Synopsys Design Compiler with CMOS 55nm process for hardware overhead estimation. Compared to the baseline RISC-V processor, for GEMM computations under precisions of 32-bit, 16-bit and 8-bit fixed, the RV-GEMM processor achieved speedup ratios of 15.8×, 28.7× and 42.5×. The peak energy efficiency also reached 260 GOPS/W, 420 GOPS/W and 609 GOPS/W, respectively.

References

[1]
Cecil Accetti RA Melo and Edna Barros. 2016. Oolong: a baseband processor extension to the risc-v isa. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 241--242.
[2]
Guozhu Xin, Jun Han, Tianyu Yin, Yuchao Zhou, Jianwei Yang, Xu Cheng, and Xiaoyang Zeng. 2020. Vpqc: a domain-specific vector processor for post-quantum cryptography based on risc-v architecture. IEEE transactions on circuits and systems I: regular papers, 67, 8, 2672--2684.
[3]
Michael Gautschi, Pasquale Davide Schiavone, Andreas Traber, Igor Loi, Antonio Pullini, Davide Rossi, Eric Flamand, Frank K Gürkaynak, and Luca Benini. 2017. Near-threshold risc-v core with dsp extensions for scalable iot endpoint devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25, 10, 2700--2713.
[4]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 52, 1, 127--138.
[5]
Weihong Xu, Zaichen Zhang, Xiaohu You, and Chuan Zhang. 2020. Reconfigurable and low-complexity accelerator for convolutional and generative networks over finite fields. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39, 12, 4894--4907.
[6]
Angelo Garofalo, Gianmarco Ottavi, Alfio Di Mauro, Francesco Conti, Giuseppe Tagliavini, Luca Benini, and Davide Rossi. 2021. A 1.15 tops/w, 16-cores parallel ultra-low power cluster with 2b-to-32b fully flexible bit-precision and vector lockstep execution mode. In ESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC). IEEE, 267--270.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
May 2024
345 pages
ISBN:9798400705977
DOI:10.1145/3649153
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. Custom Instruction Sets
  2. GEMM
  3. Hardware Acceleration
  4. RISC-V

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

CF '24
Sponsor:

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 171
    Total Downloads
  • Downloads (Last 12 months)171
  • Downloads (Last 6 weeks)27
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media