Abstract:
In this brief, first, a novel position-first dataflow is proposed to streamline the interconnections between the convolution engines (CEs) and accumulators of the archite...Show MoreMetadata
Abstract:
In this brief, first, a novel position-first dataflow is proposed to streamline the interconnections between the convolution engines (CEs) and accumulators of the architecture. In addition, an input channel merging method is devised to improve the inadequate utilization and unstable workload of CEs. With the assistance of the above two schemes, a sparse-based acceleration architecture with an expanded-scale convolution array is designed and implemented on Xilinx VCU118 platform, achieving a runtime frequency of 300MHz. The comparison results demonstrate that compared with current sparse-based works, our proposed architecture can achieve 1.10\times - 3.95\times speedup on actual performance and 1.10\times - 2.52\times speedup on DSP efficiency, respectively, when VGG16 is applied.
Published in: IEEE Transactions on Circuits and Systems II: Express Briefs ( Volume: 71, Issue: 7, July 2024)