Abstract:
Systolic array has been the crucial architecture for accelerating convolutional neural networks (CNN) since the success of Google’s TPU (Tensor Processing Unit). In this ...Show MoreMetadata
Abstract:
Systolic array has been the crucial architecture for accelerating convolutional neural networks (CNN) since the success of Google’s TPU (Tensor Processing Unit). In this work, we propose high throughput and low delay dual-line-systolic array for accelerating the convolutional neural networks. With the line-by-line vector-style systolic dataflow, the peripheral circuit can be well simplified and the loading/offloading delay can be greatly reduced. Besides, to fully take advantage of the DSP (Digital signal processor) INT8 computation in FPGA, dual-line-systolic array is developed, by which the computation throughput can be doubled. Finally, the proposed accelerator is deployed on PYNQ-Z2 for practically accelerating VGG16 neural network, peek throughput of the convolution layer can reach as high as 107.21 GOPS, which has exceeded all of the previous works on the same hardware platform.
Published in: 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
Date of Conference: 15-18 May 2022
Date Added to IEEE Xplore: 03 June 2022
ISBN Information: