Skip to main content
Log in

Neural Translation System of Meta-Domain Transfer Based on Self-Ensemble and Self-Distillation

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

In order to address the shortcoming that feature representation limitation in machine translation (MT) system, this paper presents a transfer method for features in MT. Its main aim is to solve knowledge transfer of different training corpus in the decoding process. In this paper, the meta domain is modeled. A model agnostic self-ensemble and self-distillation training framework is proposed. The training is divided into model training and meta training to better adapt to the two types of features. In this paper, we have done extensive experiments on the classical neural machine translation system, and the model is compared with the classical methods. The experimental results show that the proposed model has improved in the transfer task of different domains and systems. In this paper, translation knowledge transfer is carried out on the Chinese-English translation dataset in the subdivided domain, which has a significant performance improvement in the news, education and law domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.

Similar content being viewed by others

REFERENCES

  1. Pan, S.J. and Yang, Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 2010, vol. 22, no. 10, pp. 1345–1359.  https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  2. Caruana, R., Multitask learning, Mach. Learn., 1997, vol. 28, pp. 41–75.  https://doi.org/10.1023/A:1007379606734

    Article  Google Scholar 

  3. Daume, I., Frustratingly easy domain adaptation, Proc. 45th Ann. Meeting of the Association of Computational Linguistics, Prague, 2007, Zaenen, A. and van den Bosch, A., Eds., Prague: Association for Computational Linguistics, 2007, pp. 256–263.

  4. Zhu, X. and Goldberg, A.B., Introduction to Semi-Supervised Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, 2009.  https://doi.org/10.2200/S00196ED1V01Y200906AIM006

    Book  Google Scholar 

  5. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., and Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 2012, vol. 29, no. 6, pp. 82–97.  https://doi.org/10.1109/MSP.2012.2205597

    Article  Google Scholar 

  6. Krizhevsky, A., Sutskever, I., and Hinton, G.E., ImageNet classification with deep convolutional neural networks, Proc. 25th Int. Conf. on Neural Information Processing Systems, Lake Tahoe, Nev., 2012, Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q., Eds., Red Hook, N.Y.: Curran Associates, 2012, vol. 1, pp. 1097–1105.

  7. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J., Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016. arXiv:1609.08144 [cs.CL]

  8. Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y.N., Convolutional sequence to sequence learning, Proc. 34th Int. Conf. on Machine Learning—Volume 70, Sydney, 2017, Precup, D. and The, Y.W., Eds., JMLR.org, 2017, pp. 1243–1252.

  9. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I., Attention is all you need, Advances in Neural Information Processing Systems 30 (NIPS 2017), Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., Eds., Curran Associates, 2017, pp. 5998–6008.

  10. Miculicich Werlen, L., Pappas, N., Ram, D., and Popescu-Belis, A., Self-attentive residual decoder for neural machine translation, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018, Walker, M., Ji, H., and Stent, A., Eds., New Orleans: Association for Computational Linguistics, 2018, vol. 1, pp. 1366–1379.  https://doi.org/10.18653/v1/N18-1124

  11. Wang, C., Li, M., and Smola, A., Language models with Transformers, 2019. arXiv:1904.09408 [cs.CL]

  12. So, D.R., Le, Q., and Liang, C., The evolved Transformer, Proc. Mach. Learn. Res., 2019, vol. 97, pp. 5877–5886. arXiv:1901.11117, 2019.

  13. Zhang, D., Crego, J., and Senellart, J., Analyzing knowledge distillation in neural machine translation, Proc. 15th Int. Workshop on Spoken Language Translation, Bruges, Belgium, 2018, Turchi, M., Niehues, J., and Frederico, M., Eds., pp. 23–30.

  14. Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language translation, Proc. 53rd Ann. Meeting of the Association for Computational Linguistics and the 7th International Joint Conf. on Natural Language Processing, Beijing, 2015, Zong, C. and Strube, M., Eds., Beijing: Association for Computational Linguistics, 2015, vol. 1, pp. 1723–1732.  https://doi.org/10.3115/v1/P15-1166

  15. Lee, J., Cho, K., and Hofmann, T., Fully character-level neural machine translation without explicit segmentation, Trans. Assoc. Comput. Linguist., 2017, vol. 5, pp. 365–378.  https://doi.org/10.1162/tacl_a_00067

    Article  Google Scholar 

  16. Firat, O., Cho, K., and Bengio, Y., Multi-way, multilingual neural machine translation with a shared attention mechanism, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 866–875.  https://doi.org/10.18653/v1/N16-1101

  17. Zoph, B. and Knight, K., Multi-source neural translation, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 30–34.  https://doi.org/10.18653/v1/N16-1004

  18. Sennrich, R. and Zhang, B., Revisiting low-resource neural machine translation: A case study, Proc. 57th Ann. Meeting of the Association for Computational Linguistics, Florence, 2019, Korhonen, A., Traum, D., and Màrquez, L., Eds., Florence: Association for Computational Linguistics, 2019, pp. 211–221.  https://doi.org/10.18653/v1/P19-1021

  19. Collobert, R. and Weston, J., A unified architecture for natural language processing: Deep neural networks with multitask learning, Proc. 25th Int. Conf. on Machine Learning, Helsinki, 2008, New York: Association for Computing Machinery, 2008, pp. 160–167.  https://doi.org/10.1145/1390156.1390177

  20. Dai, A.M. and Le, Q.V., Semi-supervised sequence learning, Advances in Neural Information Processing Systems, Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., Eds., Curran Associates, 2015.

  21. Klerke, S., Goldberg, Y., and Søgaard, A., Improving sentence compression by learning to predict gaze, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 1528–1533.  https://doi.org/10.18653/v1/N16-1179

  22. Setiawan, H., Huang, Z., Devlin, J., Lamar, T., Zbib, R., Schwartz, R., and Makhoul, J., Statistical machine translation features with multitask tensor networks, Proc. 53rd Ann. Meeting of the Association for Computational Linguistics and the 7th International Joint Conf. on Natural Language Processing, Beijing, 2015, Zong, C. and Strube, M., Eds., Beijing: Association for Computational Linguistics, 2015, vol. 1, pp. 31–41. https://doi.org/10.3115/v1/P15-1004

  23. Luong, M.-T., Le, Q.V., Sutskever, I., Vinyals, O., and Kaiser, L., Multi-task sequence to sequence learning, 2015. arXiv:1511.06114 [cs.LG]

  24. Sutskever, I., Vinyals, O., and Le, Q.V., Proc. 27th Int. Conf. on Neural Information Processing Systems, Montreal, 2014, Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q., Eds., Cambridge, Mass.: MIT Press, 2014, vol. 2, pp. 3104–3112.

  25. Chen, B. and Huang, F., Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data, Proc. 20th SIGNLL Conf. on Computational Natural Language Learning, Berlin, 2016, Riezler, S. and Goldberg, Y., Eds., Berlin: Association for Computational Linguistics, 2016, pp. 314–323. https://doi.org/10.18653/v1/K16-1031

  26. Poncelas, A., Maillette de Buy Wenniger, G., and Way, A., Feature decay algorithms for neural machine translation, Proc. 21st Ann. Conf. of the European Association for Machine Translation, Alacant, Spain, 2018, Pérez-Ortiz, J.A., Sánchez-Martínez, F., Esplà-Gomis, M., Popović, M., Rico, C., Martins, A., Van den Bogaert, J., and Forcada, M.L., Alacant, Spain: Univ. d’Alacant, 2018, pp. 239–248.

  27. Poncelas, A., Sarasola, K., and Way, A., The ADAPT system description for the IWSLT 2018 Basque to English translation task, Proc. 15th Int. Workshop on Spoken Language Translation, Bruges, Belgium, 2018, Turchi, M., Niehues, J., and Frederico, M., Eds., pp. 76–82.

  28. Wang, R., Finch, A., Utiyama, M., and Sumita, E., Sentence embedding for neural machine translation domain adaptation, Proc. 55th Ann. Meeting of the Association for Computational Linguistics, Vancouver, 2017, Barzilay, R. and Kan, M.-Y., Eds., Vancouver: Association for Computational Linguistics, 2017, vol. 2, pp. 560–566. https://doi.org/10.18653/v1/P17-2089

  29. Joty, S., Sajjad, H., Durrani, N., Al-Mannai, K., Abdelali, A., and Vogel, S., How to avoid unwanted pregnancies: Domain adaptation using neural network models, Proc. 2015 Conf. on Empirical Methods in Natural Language Processing, Lisbon, 2015, Màrquez, L., Callison-Burch, C., and Su, J., Eds., Lisbon: Association for Computational Linguistics, 2015, pp. 1259–1270. https://doi.org/10.18653/v1/D15-1147

  30. Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., Liu, G., and Liu, T.-Y., Dual transfer learning for neural machine translation with marginal distribution regularization, The Thirty-Second AAAI Conf. on Artificial Intelligence (AAAI-2018), pp. 5553–5560.

  31. Chen, B., Cherry, C., Foster, G., and Larkin, S., Cost weighting for neural machine translation domain adaptation, Proc. First Workshop on Neural Machine Translation, Vancouver, 2017, Luong, T., Birch, A., Neubig, G., and Finch, A., Eds., Vancouver: Association for Computational Linguistics, 2017, pp. 40–46.  https://doi.org/10.18653/v1/W17-3205

  32. Wang, R., Utiyama, M., Liu, L., Chen, K., and Sumita, E., Instance weighting for neural machine translation domain adaptation, Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, 2017, Palmer, M., Hwa, R., and Riedel, S., Eds., Copenhagen: Association for Computational Linguistics, 2017, pp. 1482–1488.  https://doi.org/10.18653/v1/D17-1155

  33. Li, R., Wang, X., and Yu, H., MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation, Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 5, pp. 8245–8252.  https://doi.org/10.1609/aaai.v34i05.6339

  34. Thompson, B., Khayrallah, H., Anastasopoulos, A., McCarthy, A.D., Duh, K., Marvin, R., McNamee, P., Gwinnup, J., Anderson, T., and Koehn, P., Freezing subnetworks to analyze domain adaptation in neural machine translation, Proc. Third Conf. on Machine Translation: Research Papers, Brussels, 2018, Brussels: Association for Computational Linguistics, 2018, pp. 124–132.  https://doi.org/10.18653/v1/W18-6313

  35. Vilar, D., Learning hidden unit contribution for adapting neural machine translation models, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018, Walker, M., Ji, H., and Stent, A., Eds., New Orleans: Association for Computational Linguistics, 2018, pp. 500–505.  https://doi.org/10.18653/v1/N18-2080

  36. Finn, C., Abbeel, P., and Levine, S., Model-agnostic meta-learning for fast adaptation of deep networks, Proc. 34th Int. Conf. on Machine Learning, Sydney, 2017, Precup, D. and Teh, Y.W., Eds., JMLR.org, 2017, pp. 1126–1135.

  37. Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z., and Liu, T.-Y., Multilingual neural machine translation with knowledge distillation, 2019. arXiv:1902.10461 [cs.CL]

  38. Xu, Y., Qiu, X., Zhou, L., and Huang, X., Improving BERT fine-tuning via self-ensemble and self-distillation, 2020. arXiv:2002.10345 [cs.CL]

  39. Zeng, J., Liu, Y., Su, J., et al., Iterative dual domain adaptation for neural machine translation, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 2019, Inui, K., Jiang, J., Ng, V., and Wan, X., Eds., Hong Kong: Association for Computational Linguistics, 2019, pp. 845–855.  https://doi.org/10.18653/v1/D19-1078

  40. Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I.W., and Sugiyama, M., Co-teaching: Robust training of deep neural networks with extremely noisy labels, Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montreal, 2018, Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., and Cesa-Bianchi, N., Eds., Red Hook, N.Y.: Curran Associates, 2018, pp. 8536–8546.

  41. Sennrich, R., Haddow, B., and Birch, A., Neural machine translation of rare words with subword units, Proc. 54th Ann. Meeting of the Association for Computational Linguistics, Berlin, 2016, Erk, K. and Smith, N.A., Eds., Berlin: Association for Computational Linguistics, 2016, vol. 1, pp. 1715–1725.  https://doi.org/10.18653/v1/P16-1162

  42. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., BLEU: A method for automatic evaluation of machine translation, Proc. 40th Ann. Meeting on Association for Computational Linguistics, Philadelphia, 2002, Stroudsburg, Pa.: Association for Computational Linguistics, 2002, pp. 311–318.  https://doi.org/10.3115/1073083.1073135

  43. Koehn, P., Statistical significance tests for machine translation evaluation, Proc. 2004 Conf. on Empirical Methods in Natural Language Processing, Barcelona, 2004, Lin, D. and Wu, D., Eds., Barcelona: Association for Computational Linguistics, 2004, pp. 388–395.

  44. Tian, L., Wong, D.F., Chao, L.S., Quaresma, P., Oliveira, F., Lu, Y., Li, S., Wang, Y., and Wang, L., UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation, Proc. Ninth Int. Conf. on Language Resources and Evaluation (LREC’14), Reykjavik, 2014, Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. Eds., Reykjavik: European Language Resources Association, 2014, pp. 1837–1842.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochen Zhang.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yupeng Liu, Zhang, Y. & Zhang, X. Neural Translation System of Meta-Domain Transfer Based on Self-Ensemble and Self-Distillation. Aut. Control Comp. Sci. 56, 109–119 (2022). https://doi.org/10.3103/S0146411622020109

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411622020109

Keywords:

Navigation