Abstract:
With the drive toward ever more intelligent devices, neural networks (NNs) are deployed on smaller and smaller systems. For these embedded microcontrollers, memory consum...Show MoreMetadata
Abstract:
With the drive toward ever more intelligent devices, neural networks (NNs) are deployed on smaller and smaller systems. For these embedded microcontrollers, memory consumption becomes a significant challenge. We propose multiple encoding schemes that convert the decrease in parameter counts, achieved through unstructured pruning, into tangible memory savings. We first discuss a sparse encoding scheme for arbitrary sparse matrices that is based on encoding offsets from a predicted even spacing of elements in a row. The compression rate of this scheme is improved further by identifying groups of elements which can be encoded with even lower overhead. Both methods are combined into a hybrid scheme which encodes arbitrary sparse matrices with low overhead, while allowing for parallel access to multiple elements in a row at once—an important feature for using the scheme on the latest generation of microcontrollers with parallel single-instruction-multiple-data (SIMD) capabilities. Our scheme compresses sparse models to below the size of their dense counterparts for sparsities as low as 30% and reduces model size by 32.4% and 26.4% at less than 1% point of accuracy loss for two convolutional NN tasks in our evaluation.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 12, December 2024)