Abstract
Neural machine translation (NMT) systems fall short when training data is insufficient. For low-resource domain adaptation, meta-learning has proven to be an effective training scheme. It aims to find an optimal initialization that is easily adaptable to new domains. However, it is assumed that samples contribute equally in tasks and tasks contribute equally in the training task distribution, which deteriorates the performance of the meta-model. In the inner loop, we propose the dynamic tuning strategy to distinguish the tasks’ adapting abilities and weight the loss according to the representativeness to discriminate tasks from the same domain. In the outer loop, to measure effects of each task on meta parameters, we calculate uncertainty-aware confidence and assign weights on meta-updating steps. Experiments show that the proposed approaches gain stable improvements in all domains (+1.35 BLEU points in maximum). We also analyze the ability of our strategies to alleviate domain imbalance in non-ideal settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bapna, A., Firat, O.: Simple, scalable adaptation for neural machine translation. In: EMNLP/IJCNLP (1), pp. 1538–1548. Association for Computational Linguistics (2019)
Behl, H.S., Baydin, A.G., Torr, P.H.S.: Alpha MAML: adaptive model-agnostic meta-learning. CoRR abs/1905.07435 (2019)
Buntine, W.L., Weigend, A.S.: Bayesian back-propagation. Complex Syst. 5(6), 603–643 (1991)
Cai, D., Sheth, R., Mackey, L., Fusi, N.: Weighted meta-learning. CoRR abs/2003.09465 (2020)
Cer, D., et al.: Universal sentence encoder for English. In: EMNLP (Demonstration), pp. 169–174. Association for Computational Linguistics (2018)
Chu, C., Dabre, R., Kurohashi, S.: An empirical comparison of simple domain adaptation methods for neural machine translation. CoRR abs/1701.03214 (2017)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR (2017)
Freitag, M., Al-Onaizan, Y.: Fast domain adaptation for neural machine translation. CoRR abs/1612.06897 (2016)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML JMLR Workshop and Conference Proceedings, vol. 48, pp. 1050–1059. JMLR.org (2016)
Gu, S., Feng, Y.: Investigating catastrophic forgetting during continual training for neural machine translation. In: COLING, pp. 4315–4326. International Committee on Computational Linguistics (2020)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, Proceedings of Machine Learning Research, vol. 97, pp. 2790–2799. PMLR (2019)
Killamsetty, K., Li, C., Zhao, C., Iyer, R.K., Chen, F.: A reweighted meta learning framework for robust few shot learning. CoRR abs/2011.06782 (2020)
Kobus, C., Crego, J.M., Senellart, J.: Domain control for neural machine translation. In: RANLP, pp. 372–378. INCOMA Ltd. (2017)
Lee, H., et al.: Learning to balance: Bayesian meta-learning for imbalanced and out-of-distribution tasks. In: ICLR. OpenReview.net (2020)
Li, R., Wang, X., Yu, H.: MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation. In: AAAI, pp. 8245–8252. AAAI Press (2020)
Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few shot learning. CoRR abs/1707.09835 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318. ACL (2002)
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: ICML, Proceedings of Machine Learning Research, vol. 80, pp. 4331–4340. PMLR (2018)
Sajjad, H., Durrani, N., Dalvi, F., Belinkov, Y., Vogel, S.: Neural machine translation training in a multi-domain scenario. CoRR abs/1708.08712 (2017)
Sharaf, A., Hassan, H., III, H.D.: Meta-learning for few-shot NMT adaptation. In: NGT@ACL, pp. 43–53. Association for Computational Linguistics (2020)
Thompson, B., Gwinnup, J., Khayrallah, H., Duh, K., Koehn, P.: Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In: NAACL-HLT (1), pp. 2062–2068. Association for Computational Linguistics (2019)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wang, R., Utiyama, M., Liu, L., Chen, K., Sumita, E.: Instance weighting for neural machine translation domain adaptation. In: EMNLP, pp. 1482–1488. Association for Computational Linguistics (2017)
Wang, S., Liu, Y., Wang, C., Luan, H., Sun, M.: Improving back-translation with uncertainty-based confidence estimation. In: EMNLP/IJCNLP (1), pp. 791–802. Association for Computational Linguistics (2019)
van der Wees, M., Bisazza, A., Monz, C.: Dynamic data selection for neural machine translation. In: EMNLP, pp. 1400–1410. Association for Computational Linguistics (2017)
Acknowledgement
This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337, U1736207).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, Z., Ma, Z., Qi, K., Liu, G. (2021). Dynamic Tuning and Weighting of Meta-learning for NMT Domain Adaptation. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-86383-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)