Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation

Carrión, Salvador; Casacuberta, Francisco

doi:10.1007/978-3-031-36616-1_8

Salvador Carrión¹¹ &
Francisco Casacuberta¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1196 Accesses

Abstract

Neural Machine Translation (NMT) models are rarely decoupled from their vocabularies, as both are often trained together in an end-to-end fashion. However, the effects of the catastrophic forgetting problem are highly dependent on the vocabulary used. In this work, we explore the effects of the catastrophic forgetting problem from the vocabulary point of view and present a novel method based on a continual vocabulary approach that decouples vocabularies from their NMT models to improve the cross-domain performance and mitigate the effects of the catastrophic forgetting problem. Our work shows that the vocabulary domain plays a critical role in the cross-domain performance of a model. Therefore, by using a continual vocabulary capable of exploiting subword information to construct new word embeddings we can mitigate the effects of the catastrophic forgetting problem and improve the performance consistency across domains.

Supported by Pattern Recognition and Human Language Technology Center (PRHLT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Firth, J. R. (1957:11).
2.
The “Domain” in the title refers to the domain of the vocabulary used, while the name of the column groups refers to the training dataset, and the legend refers to the evaluation domain.
3.
Custom embeddings trained in Europarl-2M (de-en).

References

Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. CoRR abs/1711.09601 (2017). http://arxiv.org/abs/1711.09601
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2015)
Google Scholar
Barrault, L., et al.: Findings of the 2020 conference on machine translation (WMT20). In: Proceedings of the Fifth Conference on Machine Translation, pp. 1–55 (2020)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016)
Google Scholar
Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37(1), 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2. https://www.sciencedirect.com/science/article/pii/S0734189X87800142
Carrión, S., Casacuberta, F.: AutoNMT: a framework to streamline the research of Seq2Seq models (2022). https://github.com/salvacarrion/autonmt/
Cherry, C., Foster, G.F., Bapna, A., Firat, O., Macherey, W.: Revisiting character-based neural machine translation with capacity and compression. CoRR abs/1808.09943 (2018). http://arxiv.org/abs/1808.09943
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1724–1734 (2014)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Draelos, T.J., et al.: Neurogenesis deep learning. CoRR abs/1612.03770 (2016)
Google Scholar
Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. CoRR abs/1701.08734 (2017). http://arxiv.org/abs/1701.08734
Gowda, T., May, J.: Finding the optimal vocabulary size for neural machine translation. In: Findings of the ACL: EMNLP 2020, pp. 3955–3964 (2020)
Google Scholar
Hu, W., et al.: Overcoming catastrophic forgetting for continual learning via model adaptation. In: ICLR (2019)
Google Scholar
Jung, H., Ju, J., Jung, M., Kim, J.: Less-forgetting learning in deep neural networks. CoRR abs/1607.00122 (2016). http://arxiv.org/abs/1607.00122
Kemker, R., Kanan, C.: FearNet: brain-inspired model for incremental learning. CoRR abs/1711.10563 (2017)
Google Scholar
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL 2007 (2007)
Google Scholar
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Blanco, E., Lu, W. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pp. 66–71 (2018). https://doi.org/10.18653/v1/d18-2012
Lee, J., Yoon, J., Yang, E., Hwang, S.J.: Lifelong learning with dynamically expandable networks. CoRR abs/1708.01547 (2017). http://arxiv.org/abs/1708.01547
Lee, S., Kim, J., Ha, J., Zhang, B.: Overcoming catastrophic forgetting by incremental moment matching. CoRR abs/1703.08475 (2017). http://arxiv.org/abs/1703.08475
Li, Z., Hoiem, D.: Learning without forgetting. CoRR abs/1606.09282 (2016)
Google Scholar
Liu, T., Ungar, L., Sedoc, J.: Continual learning for sentence representations using conceptors. ArXiv abs/1904.09187 (2019)
Google Scholar
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continuum learning. CoRR abs/1706.08840 (2017). http://arxiv.org/abs/1706.08840
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on EMNLP, pp. 1412–1421 (2015)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Academic Press (1989). https://doi.org/10.1016/S0079-7421(08)60536-8, https://www.sciencedirect.com/science/article/pii/S0079742108605368
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, p. 311–318. ACL 2002 (2002)
Google Scholar
Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191 (2018)
Google Scholar
Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers) (2018)
Google Scholar
Rebuffi, S., Kolesnikov, A., Lampert, C.H.: icarl: Incremental classifier and representation learning. CoRR abs/1611.07725 (2016). http://arxiv.org/abs/1611.07725
Rusu, A.A., et al.: Progressive neural networks. CoRR abs/1606.04671 (2016)
Google Scholar
Sato, S., Sakuma, J., Yoshinaga, N., Toyoda, M., Kitsuregawa, M.: Vocabulary adaptation for domain adaptation in neural machine translation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4269–4279. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.381.https://aclanthology.org/2020.findings-emnlp.381
Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning (2018). https://doi.org/10.48550/ARXIV.1805.06370. https://arxiv.org/abs/1805.06370
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the ACL (Long Papers), vol. 1, pp. 1715–1725 (2016)
Google Scholar
Sennrich, R., Zhang, B.: Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 211–221. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1021. https://aclanthology.org/P19-1021
Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. CoRR abs/1705.08690 (2017). http://arxiv.org/abs/1705.08690
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) NIPS, vol. 27 (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NeurIPS, pp. 6000–6010. NIPS 2017 (2017)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016)
Google Scholar
Xu, H., Liu, B., Shu, L., Yu, P.S.: Lifelong domain word embedding via meta-learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-2018, pp. 4510–4516 (2018)
Google Scholar
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017). https://proceedings.mlr.press/v70/zenke17a.html
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTscore: evaluating text generation with BERT. CoRR abs/1904.09675 (2019). http://arxiv.org/abs/1904.09675

Download references

Acknowledgment

Work supported by the Horizon 2020 - European Commission (H2020) under the SELENE project (grant agreement no 871467) and the project Deep learning for adaptive and multimodal interaction in pattern recognition (DeepPattern) (grant agreement PROMETEO/2019/121). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a GPU used for part of this research.

Author information

Authors and Affiliations

Universitat Politècnica de València, Camí de Vera, 46022, València, Spain
Salvador Carrión & Francisco Casacuberta

Authors

Salvador Carrión
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvador Carrión .

Editor information

Editors and Affiliations

University of Alicante, Alicante, Spain
Antonio Pertusa
University of Alicante, Alicante, Spain
Antonio Javier Gallego
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez
IPO Porto, Coimbra, Portugal
Inês Domingues

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carrión, S., Casacuberta, F. (2023). Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-36616-1_8
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation