Abstract:
Binarization of neural networks' activations may be a requirement for some applications. A typical example is end-to-end learned deep image compression systems where the ...View moreMetadata
Abstract:
Binarization of neural networks' activations may be a requirement for some applications. A typical example is end-to-end learned deep image compression systems where the encoder's output is requred to be a binary vector. Binarization is non-differentiable, therefore one needs to approximate it in order to train neural networks with stochastic gradient descent. In this paper, we investigate these training strategies and provide improvements over baselines. We find that during training, constraining the activations in a region that is far away from binary points leads to a better performance at test-time. The above finding provides a counter-intuitive result and leads to re-thinking the binarization approximation problem in neural networks.
Date of Conference: 14-19 July 2019
Date Added to IEEE Xplore: 30 September 2019
ISBN Information: