Loading [MathJax]/extensions/MathMenu.js
A 200MHZ 202.4GFLOPS@10.8W VGG16 accelerator in Xilinx VX690T | IEEE Conference Publication | IEEE Xplore

A 200MHZ 202.4GFLOPS@10.8W VGG16 accelerator in Xilinx VX690T


Abstract:

Convolutional Neural Networks (CNN) are among the most powerful and widely used algorithms for computer vision applications, notwithstanding their computation-demanding a...Show More

Abstract:

Convolutional Neural Networks (CNN) are among the most powerful and widely used algorithms for computer vision applications, notwithstanding their computation-demanding and memory-intensive operations. The cumbersome CNN operation stems from the bulky cross channel computation and massive parameter retrieving of convolutional (CONV) layers and fully-connected (FC) layers, respectively. In this paper, to remove the inter-filter redundancy, we constructed and tuned the specific low-rank filters in fully-connected layers. The proposed rank reduction saves 88.9% of both arithmetic and parameters of fully-connected layers in the VGG16 model. In addition, by employing network-layer-wise ping-pong DDR access mode, tile-grain on-chip feature map buffers, and Propagate Partial Multiply-Accumulate (PPMAC) processor, we implemented a 202.4 GFLOPS CNN accelerator with half-precision data format on Xilinx VC709 evaluation board. Experiments show that the accelerator achieved 6.58 fps throughput with 0.7046 top-1 accuracy and 0.8977 top-5 accuracy under 200MHz working frequency.
Date of Conference: 14-16 November 2017
Date Added to IEEE Xplore: 08 March 2018
ISBN Information:
Conference Location: Montreal, QC, Canada

Contact IEEE to Subscribe

References

References is not available for this document.