Journals & Magazines >IEEE Transactions on Circuits... >Volume: 66 Issue: 9

WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

As convolutional neural networks (CNNs) become more and more diverse and complicated, acceleration of CNNs increasingly encounters a bottleneck of balancing performance, ...Show More

Metadata

Abstract:

As convolutional neural networks (CNNs) become more and more diverse and complicated, acceleration of CNNs increasingly encounters a bottleneck of balancing performance, energy efficiency, and flexibility in a unified architecture. This paper proposed a Winograd-based highly efficient and dynamically Reconfigurable Accelerator (named WRA) for quickly evolving CNN models. A cost-effective convolution decomposition method (CDW) was proposed, and it extends the application of the fast Winograd algorithm. Based on CDW, a high-throughput and reconfigurable processing element (PE) array was designed to exploit the parallelism of Winograd. Besides, a highly compact memory structure employed four levels of data reuse schemes to achieve maximal data reuse and minimize external bandwidth requirement. Provided with dynamically reconfigurable capability, WRA implements CDW and other convolutions (e.g., standard convolution, depthwise separable convolution, and group convolution) on a unified hardware architecture. The WRA accelerator was implemented on a Xilinx XCVU9P platform running at 330 MHz clock frequency, controlled by a POWER8 processor via a coherent accelerator processor interface (CAPI) interface. At different configurations, WRA can provide 2.2-6.3 TOPS performance for different convolution shapes. The average performance and energy efficiency for VGG16/AlexNet/MobileNetV1/MobileNetV2 are 5288 GOP/s at 151.2 GOPs/W, 3478 GOP/s at 99.4 GOPs/W, 2674 GOP/s at 76.4 GOPs/W, and 2194 GOP/s at 62.7 GOPs/W. It achieves

$1.7\times$ -

$24\times$ speedup compared with the previous FPGA-based designs.

Published in: IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: 66, Issue: 9, September 2019)

Page(s): 3480 - 3493

Date of Publication: 29 July 2019

ISSN Information:

DOI: 10.1109/TCSI.2019.2928682

Funding Agency:

Contents

References is not available for this document.

WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?