Loading [MathJax]/extensions/MathMenu.js
An Empirical Comparison of Quantization, Pruning and Low-rank Neural Network Compression using the LC Toolkit | IEEE Conference Publication | IEEE Xplore

An Empirical Comparison of Quantization, Pruning and Low-rank Neural Network Compression using the LC Toolkit


Abstract:

Compression of machine learning models, and of neural networks in particular, has become an essential problem among practitioners. Many different approaches including qua...Show More

Abstract:

Compression of machine learning models, and of neural networks in particular, has become an essential problem among practitioners. Many different approaches including quantization, pruning, low-rank and tensor decompositions have been proposed in the literature to solve the problem. Despite this, an important unanswered question remains: what is the best compression scheme for a model? As a step towards answering this question objectively and fairly, we empirically compare quantization, pruning, and low-rank compressions in the algorithmic footing of the Learning-Compression (LC) framework. This allows us to explore the compression schemes systematically and perform an apples-to-apples comparison along the entire error-compression tradeoff curves. We describe our methodology, the framework, experimental setup, and present our comparisons. Based on our experiments, we conclude that the choice of compression is strongly model-dependent: for example, VGG16 is better compressed with pruning, while quantization is more suitable for the ResNets. This, once again, underlines the need for a common benchmark of compression schemes with fair and objective comparisons of the models of interest.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 20 September 2021
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China

Contact IEEE to Subscribe

References

References is not available for this document.