Abstract:
Compression of machine learning models, and of neural networks in particular, has become an essential problem among practitioners. Many different approaches including qua...Show MoreMetadata
Abstract:
Compression of machine learning models, and of neural networks in particular, has become an essential problem among practitioners. Many different approaches including quantization, pruning, low-rank and tensor decompositions have been proposed in the literature to solve the problem. Despite this, an important unanswered question remains: what is the best compression scheme for a model? As a step towards answering this question objectively and fairly, we empirically compare quantization, pruning, and low-rank compressions in the algorithmic footing of the Learning-Compression (LC) framework. This allows us to explore the compression schemes systematically and perform an apples-to-apples comparison along the entire error-compression tradeoff curves. We describe our methodology, the framework, experimental setup, and present our comparisons. Based on our experiments, we conclude that the choice of compression is strongly model-dependent: for example, VGG16 is better compressed with pruning, while quantization is more suitable for the ResNets. This, once again, underlines the need for a common benchmark of compression schemes with fair and objective comparisons of the models of interest.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 20 September 2021
ISBN Information: