Abstract:
In this paper, we investigate a hardware approach for on-device training and inference targeting fully quantized graph convolutional networks (GCNs). Our proposed solutio...Show MoreMetadata
Abstract:
In this paper, we investigate a hardware approach for on-device training and inference targeting fully quantized graph convolutional networks (GCNs). Our proposed solution leverages a specialized hardware accelerator consisting of a streaming architecture and adaptive fixed-point numeric precision. The accelerator offers scalable performance via a variable number of independent hardware threads and compute units per thread. During training, the architecture widens the data path in the backward pass to maintain the gradient accuracy needed for backpropagation. In contrast, during the forward pass, the accelerator narrows the data path to emulate the uncertainty introduced by the quantized parameters. We use the popular Planetoid datasets to benchmark the accelerator, achieving valid precisions extending down to 1-bit for weights and features and 2-bits for adjacency. The performance gains over the Pytorch optimized software solution show more than 2 orders of magnitude for inference and 1 order of magnitude for training. A comparison with previous GCN accelerators designed for inference-only mode and based on HPC (High Performance Computing) FPGA platforms shows competitive performance.
Date of Conference: 29-30 October 2024
Date Added to IEEE Xplore: 18 November 2024
ISBN Information: