Conferences >2017 IEEE Global Conference o...

A 200MHZ 202.4GFLOPS@10.8W VGG16 accelerator in Xilinx VX690T

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Convolutional Neural Networks (CNN) are among the most powerful and widely used algorithms for computer vision applications, notwithstanding their computation-demanding a...Show More

Metadata

Abstract:

Convolutional Neural Networks (CNN) are among the most powerful and widely used algorithms for computer vision applications, notwithstanding their computation-demanding and memory-intensive operations. The cumbersome CNN operation stems from the bulky cross channel computation and massive parameter retrieving of convolutional (CONV) layers and fully-connected (FC) layers, respectively. In this paper, to remove the inter-filter redundancy, we constructed and tuned the specific low-rank filters in fully-connected layers. The proposed rank reduction saves 88.9% of both arithmetic and parameters of fully-connected layers in the VGG16 model. In addition, by employing network-layer-wise ping-pong DDR access mode, tile-grain on-chip feature map buffers, and Propagate Partial Multiply-Accumulate (PPMAC) processor, we implemented a 202.4 GFLOPS CNN accelerator with half-precision data format on Xilinx VC709 evaluation board. Experiments show that the accelerator achieved 6.58 fps throughput with 0.7046 top-1 accuracy and 0.8977 top-5 accuracy under 200MHz working frequency.

Published in: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Date of Conference: 14-16 November 2017

Date Added to IEEE Xplore: 08 March 2018

ISBN Information:

DOI: 10.1109/GlobalSIP.2017.8309067

Conference Location: Montreal, QC, Canada