ABSTRACT
The recent advance in artificial intelligence (AI) technology has led to a new round of systolic structure innovation. Many AI accelerators have employed systolic structure to realize the core large-scale matrix-vector multiplication for high-performance processing, which has a complexity of $o(n^2)$ for matrix size of $n\times n$ (difficult to be implemented on the field-programmable gate array (FPGA) platform). To overcome this drawback, in this paper, we propose a super systolization strategy to implement the core circulant matrix-vector multiplication into a systolic structure with subquadratic space complexity. The proposed effort is carried out through two stages of coherent interdependent efforts: (i) a novel matrix-vector multiplication algorithm based on Toeplitz matrix-vector product (TMVP) approach is proposed to obtain subquadratic space complexity; (ii) a series of optimization techniques are introduced to map the proposed algorithm into desired systolic structure. Finally, detailed complexity analysis and comparison have been conducted to prove the efficiency of the proposed strategy. The proposed strategy is highly efficient and can be extended in many neural network based hardware implementation platforms.
Index Terms
- Embracing Systolic: Super Systolization of Large-Scale Circulant Matrix-vector Multiplication on FPGA with Subquadratic Space Complexity
Recommendations
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Hardwired MPEG-4 repetitive padding
We consider two hardwired solutions for repetitive padding, a performance restricting algorithm for real time MPEG-4 execution. The first solution regards application specific implementations, the second regards general purpose processing. For the ...
Multiway Splitting Method for Toeplitz Matrix Vector Product
Computing the product of a Toeplitz matrix and a vector arises in various applications including cryptography. In this paper, we consider Toeplitz matrices and vectors with entries in $({\hbox{\rlap{I}\kern 2.0pt{\hbox{F}}}}_2)$. For improved efficiency ...
Comments