Squashed Weight Distribution for Low Bit Quantization of Deep Models

Strom, Nikko; Khan, Haidar; Hamza, Wael

doi:10.21437/Interspeech.2022-50

Squashed Weight Distribution for Low Bit Quantization of Deep Models

Nikko Strom, Haidar Khan, Wael Hamza

Inference with large deep learning models in resource-constrained settings is increasingly a bottleneck in real-world applications of state-of-the-art AI. Here we address this by low-precision weight quantization. We achieve very low accuracy degradation by re-parametrizing the weights in a way that leaves the weight distribution approximately uniform. We show lower bit-width quantization and less accuracy degradation than previously reported in experiments on GLUE benchmarks (3-bit, 0.2% rel. degradation), and on internal intent/slot-filling datasets (2-bit, 0.4% rel. degradation).

doi: 10.21437/Interspeech.2022-50

Cite as: Strom, N., Khan, H., Hamza, W. (2022) Squashed Weight Distribution for Low Bit Quantization of Deep Models. Proc. Interspeech 2022, 3953-3957, doi: 10.21437/Interspeech.2022-50

@inproceedings{strom22_interspeech,
  author={Nikko Strom and Haidar Khan and Wael Hamza},
  title={{Squashed Weight Distribution for Low Bit Quantization of Deep Models}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3953--3957},
  doi={10.21437/Interspeech.2022-50}
}