Approximating neural machine translation for efficiency

Aji, Alham Fikri

View/Open

Aji2020.pdf (1.250Mb)

Date

25/07/2020

Author

Aji, Alham Fikri

Metadata

Show full item record

Abstract

Neural machine translation (NMT) has been shown to outperform statistical machine translation. However, NMT models typically require a large number of parameters and are expensive to train and deploy. Moreover, its large model size makes parallel training inefficient due to costly network communication. Likewise, distributing and locally running the model for a client-based NMT model such as a web browser or mobile device remains challenging. This thesis investigates ways to approximately train an NMT system by compressing either the gradients or the parameters for faster communication or reduced memory consumption. We propose a gradient compression technique that exchanges only the top 1% of the most significant gradient values while delaying the rest to be considered for the next iteration. This method reduces the network communication cost by 50-fold but causes noisy gradient updates. We also find that Transformer–the current state-of-the-art NMT architecture–is highly sensitive to noisy gradients. Therefore, we extend the compression technique by restoring the compressed gradient with locally-computed gradients. We obtained a linear scale-up in parallel training without sacrificing model performance. We also explore transfer learning as a better method of initialising the training. With transfer learning, the model converges faster and can be trained with more aggressive hyperparameters. Lastly, we propose a log-based quantisation method to compress the model size. Models are quantised to 4-bit precision with no noticeable quality degradation after re-training combined with reserving the quantisation errors as feedback.

URI

https://hdl.handle.net/1842/37232

http://dx.doi.org/10.7488/era/533

Collections

Informatics thesis and dissertation collection