Abstract
Neural Machine Translation (NMT) models are rarely decoupled from their vocabularies, as both are often trained together in an end-to-end fashion. However, the effects of the catastrophic forgetting problem are highly dependent on the vocabulary used. In this work, we explore the effects of the catastrophic forgetting problem from the vocabulary point of view and present a novel method based on a continual vocabulary approach that decouples vocabularies from their NMT models to improve the cross-domain performance and mitigate the effects of the catastrophic forgetting problem. Our work shows that the vocabulary domain plays a critical role in the cross-domain performance of a model. Therefore, by using a continual vocabulary capable of exploiting subword information to construct new word embeddings we can mitigate the effects of the catastrophic forgetting problem and improve the performance consistency across domains.
Supported by Pattern Recognition and Human Language Technology Center (PRHLT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Firth, J. R. (1957:11).
- 2.
The “Domain” in the title refers to the domain of the vocabulary used, while the name of the column groups refers to the training dataset, and the legend refers to the evaluation domain.
- 3.
Custom embeddings trained in Europarl-2M (de-en).
References
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. CoRR abs/1711.09601 (2017). http://arxiv.org/abs/1711.09601
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2015)
Barrault, L., et al.: Findings of the 2020 conference on machine translation (WMT20). In: Proceedings of the Fifth Conference on Machine Translation, pp. 1–55 (2020)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016)
Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37(1), 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2. https://www.sciencedirect.com/science/article/pii/S0734189X87800142
Carrión, S., Casacuberta, F.: AutoNMT: a framework to streamline the research of Seq2Seq models (2022). https://github.com/salvacarrion/autonmt/
Cherry, C., Foster, G.F., Bapna, A., Firat, O., Macherey, W.: Revisiting character-based neural machine translation with capacity and compression. CoRR abs/1808.09943 (2018). http://arxiv.org/abs/1808.09943
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1724–1734 (2014)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Draelos, T.J., et al.: Neurogenesis deep learning. CoRR abs/1612.03770 (2016)
Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. CoRR abs/1701.08734 (2017). http://arxiv.org/abs/1701.08734
Gowda, T., May, J.: Finding the optimal vocabulary size for neural machine translation. In: Findings of the ACL: EMNLP 2020, pp. 3955–3964 (2020)
Hu, W., et al.: Overcoming catastrophic forgetting for continual learning via model adaptation. In: ICLR (2019)
Jung, H., Ju, J., Jung, M., Kim, J.: Less-forgetting learning in deep neural networks. CoRR abs/1607.00122 (2016). http://arxiv.org/abs/1607.00122
Kemker, R., Kanan, C.: FearNet: brain-inspired model for incremental learning. CoRR abs/1711.10563 (2017)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL 2007 (2007)
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Blanco, E., Lu, W. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pp. 66–71 (2018). https://doi.org/10.18653/v1/d18-2012
Lee, J., Yoon, J., Yang, E., Hwang, S.J.: Lifelong learning with dynamically expandable networks. CoRR abs/1708.01547 (2017). http://arxiv.org/abs/1708.01547
Lee, S., Kim, J., Ha, J., Zhang, B.: Overcoming catastrophic forgetting by incremental moment matching. CoRR abs/1703.08475 (2017). http://arxiv.org/abs/1703.08475
Li, Z., Hoiem, D.: Learning without forgetting. CoRR abs/1606.09282 (2016)
Liu, T., Ungar, L., Sedoc, J.: Continual learning for sentence representations using conceptors. ArXiv abs/1904.09187 (2019)
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continuum learning. CoRR abs/1706.08840 (2017). http://arxiv.org/abs/1706.08840
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on EMNLP, pp. 1412–1421 (2015)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Academic Press (1989). https://doi.org/10.1016/S0079-7421(08)60536-8, https://www.sciencedirect.com/science/article/pii/S0079742108605368
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, p. 311–318. ACL 2002 (2002)
Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191 (2018)
Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers) (2018)
Rebuffi, S., Kolesnikov, A., Lampert, C.H.: icarl: Incremental classifier and representation learning. CoRR abs/1611.07725 (2016). http://arxiv.org/abs/1611.07725
Rusu, A.A., et al.: Progressive neural networks. CoRR abs/1606.04671 (2016)
Sato, S., Sakuma, J., Yoshinaga, N., Toyoda, M., Kitsuregawa, M.: Vocabulary adaptation for domain adaptation in neural machine translation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4269–4279. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.381.https://aclanthology.org/2020.findings-emnlp.381
Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning (2018). https://doi.org/10.48550/ARXIV.1805.06370. https://arxiv.org/abs/1805.06370
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the ACL (Long Papers), vol. 1, pp. 1715–1725 (2016)
Sennrich, R., Zhang, B.: Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 211–221. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1021. https://aclanthology.org/P19-1021
Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. CoRR abs/1705.08690 (2017). http://arxiv.org/abs/1705.08690
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) NIPS, vol. 27 (2014)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NeurIPS, pp. 6000–6010. NIPS 2017 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016)
Xu, H., Liu, B., Shu, L., Yu, P.S.: Lifelong domain word embedding via meta-learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-2018, pp. 4510–4516 (2018)
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017). https://proceedings.mlr.press/v70/zenke17a.html
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTscore: evaluating text generation with BERT. CoRR abs/1904.09675 (2019). http://arxiv.org/abs/1904.09675
Acknowledgment
Work supported by the Horizon 2020 - European Commission (H2020) under the SELENE project (grant agreement no 871467) and the project Deep learning for adaptive and multimodal interaction in pattern recognition (DeepPattern) (grant agreement PROMETEO/2019/121). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a GPU used for part of this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Carrión, S., Casacuberta, F. (2023). Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)