Abstract:
Adapting CNNs to changing problems is challenging on resource-limited edge devices due to intensive computations, high precision requirements, large storage needs, and hi...Show MoreMetadata
Abstract:
Adapting CNNs to changing problems is challenging on resource-limited edge devices due to intensive computations, high precision requirements, large storage needs, and high bandwidth. This paper presents BOOST, a novel block minifloat (BM)-based parallel CNN training accelerator on memory- and computation-constrained FPGAs for transfer learning (TL). By updating a small number of layers online, BOOST enables adaptation to changing problems. Our approach utilizes a unified 8-bit BM datatype (bm(2,5) ), i.e., with a sign bit, 2 exponent bits, and 5 mantissa bits, and proposes unified Conv and dilated Conv blocks that support non-unit stride and enable task-level parallelism during back-propagation to minimize latency. For ResNet20 and VGG-like training on CIFAR-10 and SVHN datasets, BOOST achieves near 32-bit floating point accuracy, reducing latency by 21%-43% and BRAM usage by 63%-66% compared to back-propagation training without TL. Notably, BOOST outperforms the prior SOTA works to achieve perbatch throughput of 131 and 209 GOPs for ResNet20 and VGG-like respectively.
Date of Conference: 28 October 2023 - 02 November 2023
Date Added to IEEE Xplore: 30 November 2023
ISBN Information: