Abstract:
This paper describes the architecture of a FPGA-based high-performance training accelerator for neural networks. Our accelerator uses a hybrid embedded floating point and...Show MoreMetadata
Abstract:
This paper describes the architecture of a FPGA-based high-performance training accelerator for neural networks. Our accelerator uses a hybrid embedded floating point and soft logic approach to implement truncated floating-point datapaths, including bfloat16 and bfloat14. The proposed multi-layer perceptron (MLP) training architecture is the highlight of a general methodology for developing high-performance accelerators written in OpenCL and incorporating a systolic-array GEMM engine with off-chip memory interfaces. The accelerator is capable of 5 Tflops on a mid-range FPGA device and achieves over 90% of the peak efficiency during training, thus demonstrating the versatility of using FPGAs as neural network training accelerators.
Date of Conference: 09-11 December 2019
Date Added to IEEE Xplore: 13 February 2020
ISBN Information: