Skip to main content

Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

  • 1196 Accesses

Abstract

Neural Machine Translation (NMT) models are rarely decoupled from their vocabularies, as both are often trained together in an end-to-end fashion. However, the effects of the catastrophic forgetting problem are highly dependent on the vocabulary used. In this work, we explore the effects of the catastrophic forgetting problem from the vocabulary point of view and present a novel method based on a continual vocabulary approach that decouples vocabularies from their NMT models to improve the cross-domain performance and mitigate the effects of the catastrophic forgetting problem. Our work shows that the vocabulary domain plays a critical role in the cross-domain performance of a model. Therefore, by using a continual vocabulary capable of exploiting subword information to construct new word embeddings we can mitigate the effects of the catastrophic forgetting problem and improve the performance consistency across domains.

Supported by Pattern Recognition and Human Language Technology Center (PRHLT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Firth, J. R. (1957:11).

  2. 2.

    The “Domain” in the title refers to the domain of the vocabulary used, while the name of the column groups refers to the training dataset, and the legend refers to the evaluation domain.

  3. 3.

    Custom embeddings trained in Europarl-2M (de-en).

References

  1. Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. CoRR abs/1711.09601 (2017). http://arxiv.org/abs/1711.09601

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2015)

    Google Scholar 

  3. Barrault, L., et al.: Findings of the 2020 conference on machine translation (WMT20). In: Proceedings of the Fifth Conference on Machine Translation, pp. 1–55 (2020)

    Google Scholar 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. CoRR abs/1607.04606 (2016)

    Google Scholar 

  5. Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37(1), 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2. https://www.sciencedirect.com/science/article/pii/S0734189X87800142

  6. Carrión, S., Casacuberta, F.: AutoNMT: a framework to streamline the research of Seq2Seq models (2022). https://github.com/salvacarrion/autonmt/

  7. Cherry, C., Foster, G.F., Bapna, A., Firat, O., Macherey, W.: Revisiting character-based neural machine translation with capacity and compression. CoRR abs/1808.09943 (2018). http://arxiv.org/abs/1808.09943

  8. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1724–1734 (2014)

    Google Scholar 

  9. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  10. Draelos, T.J., et al.: Neurogenesis deep learning. CoRR abs/1612.03770 (2016)

    Google Scholar 

  11. Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. CoRR abs/1701.08734 (2017). http://arxiv.org/abs/1701.08734

  12. Gowda, T., May, J.: Finding the optimal vocabulary size for neural machine translation. In: Findings of the ACL: EMNLP 2020, pp. 3955–3964 (2020)

    Google Scholar 

  13. Hu, W., et al.: Overcoming catastrophic forgetting for continual learning via model adaptation. In: ICLR (2019)

    Google Scholar 

  14. Jung, H., Ju, J., Jung, M., Kim, J.: Less-forgetting learning in deep neural networks. CoRR abs/1607.00122 (2016). http://arxiv.org/abs/1607.00122

  15. Kemker, R., Kanan, C.: FearNet: brain-inspired model for incremental learning. CoRR abs/1711.10563 (2017)

    Google Scholar 

  16. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. CoRR abs/1612.00796 (2016)

    Google Scholar 

  17. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL 2007 (2007)

    Google Scholar 

  18. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Blanco, E., Lu, W. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pp. 66–71 (2018). https://doi.org/10.18653/v1/d18-2012

  19. Lee, J., Yoon, J., Yang, E., Hwang, S.J.: Lifelong learning with dynamically expandable networks. CoRR abs/1708.01547 (2017). http://arxiv.org/abs/1708.01547

  20. Lee, S., Kim, J., Ha, J., Zhang, B.: Overcoming catastrophic forgetting by incremental moment matching. CoRR abs/1703.08475 (2017). http://arxiv.org/abs/1703.08475

  21. Li, Z., Hoiem, D.: Learning without forgetting. CoRR abs/1606.09282 (2016)

    Google Scholar 

  22. Liu, T., Ungar, L., Sedoc, J.: Continual learning for sentence representations using conceptors. ArXiv abs/1904.09187 (2019)

    Google Scholar 

  23. Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continuum learning. CoRR abs/1706.08840 (2017). http://arxiv.org/abs/1706.08840

  24. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on EMNLP, pp. 1412–1421 (2015)

    Google Scholar 

  25. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  26. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Academic Press (1989). https://doi.org/10.1016/S0079-7421(08)60536-8, https://www.sciencedirect.com/science/article/pii/S0079742108605368

  27. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, p. 311–318. ACL 2002 (2002)

    Google Scholar 

  28. Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191 (2018)

    Google Scholar 

  29. Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers) (2018)

    Google Scholar 

  30. Rebuffi, S., Kolesnikov, A., Lampert, C.H.: icarl: Incremental classifier and representation learning. CoRR abs/1611.07725 (2016). http://arxiv.org/abs/1611.07725

  31. Rusu, A.A., et al.: Progressive neural networks. CoRR abs/1606.04671 (2016)

    Google Scholar 

  32. Sato, S., Sakuma, J., Yoshinaga, N., Toyoda, M., Kitsuregawa, M.: Vocabulary adaptation for domain adaptation in neural machine translation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4269–4279. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.381.https://aclanthology.org/2020.findings-emnlp.381

  33. Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning (2018). https://doi.org/10.48550/ARXIV.1805.06370. https://arxiv.org/abs/1805.06370

  34. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the ACL (Long Papers), vol. 1, pp. 1715–1725 (2016)

    Google Scholar 

  35. Sennrich, R., Zhang, B.: Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 211–221. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1021. https://aclanthology.org/P19-1021

  36. Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. CoRR abs/1705.08690 (2017). http://arxiv.org/abs/1705.08690

  37. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) NIPS, vol. 27 (2014)

    Google Scholar 

  38. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st NeurIPS, pp. 6000–6010. NIPS 2017 (2017)

    Google Scholar 

  39. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016)

    Google Scholar 

  40. Xu, H., Liu, B., Shu, L., Yu, P.S.: Lifelong domain word embedding via meta-learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-2018, pp. 4510–4516 (2018)

    Google Scholar 

  41. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017). https://proceedings.mlr.press/v70/zenke17a.html

  42. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTscore: evaluating text generation with BERT. CoRR abs/1904.09675 (2019). http://arxiv.org/abs/1904.09675

Download references

Acknowledgment

Work supported by the Horizon 2020 - European Commission (H2020) under the SELENE project (grant agreement no 871467) and the project Deep learning for adaptive and multimodal interaction in pattern recognition (DeepPattern) (grant agreement PROMETEO/2019/121). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a GPU used for part of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvador Carrión .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carrión, S., Casacuberta, F. (2023). Continual Vocabularies to Tackle the Catastrophic Forgetting Problem in Machine Translation. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36616-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36615-4

  • Online ISBN: 978-3-031-36616-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics