Approximating neural machine translation for efficiency
View/ Open
Date
25/07/2020Author
Aji, Alham Fikri
Metadata
Abstract
Neural machine translation (NMT) has been shown to outperform statistical machine
translation. However, NMT models typically require a large number of parameters
and are expensive to train and deploy. Moreover, its large model size makes parallel
training inefficient due to costly network communication. Likewise, distributing and
locally running the model for a client-based NMT model such as a web browser or
mobile device remains challenging. This thesis investigates ways to approximately
train an NMT system by compressing either the gradients or the parameters for faster
communication or reduced memory consumption. We propose a gradient compression
technique that exchanges only the top 1% of the most significant gradient values while
delaying the rest to be considered for the next iteration. This method reduces the
network communication cost by 50-fold but causes noisy gradient updates. We also
find that Transformer–the current state-of-the-art NMT architecture–is highly sensitive
to noisy gradients. Therefore, we extend the compression technique by restoring the
compressed gradient with locally-computed gradients. We obtained a linear scale-up
in parallel training without sacrificing model performance. We also explore transfer
learning as a better method of initialising the training. With transfer learning, the model
converges faster and can be trained with more aggressive hyperparameters. Lastly, we
propose a log-based quantisation method to compress the model size. Models are
quantised to 4-bit precision with no noticeable quality degradation after re-training
combined with reserving the quantisation errors as feedback.