Abstract
In order to address the shortcoming that feature representation limitation in machine translation (MT) system, this paper presents a transfer method for features in MT. Its main aim is to solve knowledge transfer of different training corpus in the decoding process. In this paper, the meta domain is modeled. A model agnostic self-ensemble and self-distillation training framework is proposed. The training is divided into model training and meta training to better adapt to the two types of features. In this paper, we have done extensive experiments on the classical neural machine translation system, and the model is compared with the classical methods. The experimental results show that the proposed model has improved in the transfer task of different domains and systems. In this paper, translation knowledge transfer is carried out on the Chinese-English translation dataset in the subdivided domain, which has a significant performance improvement in the news, education and law domain.
Similar content being viewed by others
REFERENCES
Pan, S.J. and Yang, Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 2010, vol. 22, no. 10, pp. 1345–1359. https://doi.org/10.1109/TKDE.2009.191
Caruana, R., Multitask learning, Mach. Learn., 1997, vol. 28, pp. 41–75. https://doi.org/10.1023/A:1007379606734
Daume, I., Frustratingly easy domain adaptation, Proc. 45th Ann. Meeting of the Association of Computational Linguistics, Prague, 2007, Zaenen, A. and van den Bosch, A., Eds., Prague: Association for Computational Linguistics, 2007, pp. 256–263.
Zhu, X. and Goldberg, A.B., Introduction to Semi-Supervised Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, 2009. https://doi.org/10.2200/S00196ED1V01Y200906AIM006
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., and Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 2012, vol. 29, no. 6, pp. 82–97. https://doi.org/10.1109/MSP.2012.2205597
Krizhevsky, A., Sutskever, I., and Hinton, G.E., ImageNet classification with deep convolutional neural networks, Proc. 25th Int. Conf. on Neural Information Processing Systems, Lake Tahoe, Nev., 2012, Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q., Eds., Red Hook, N.Y.: Curran Associates, 2012, vol. 1, pp. 1097–1105.
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J., Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016. arXiv:1609.08144 [cs.CL]
Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y.N., Convolutional sequence to sequence learning, Proc. 34th Int. Conf. on Machine Learning—Volume 70, Sydney, 2017, Precup, D. and The, Y.W., Eds., JMLR.org, 2017, pp. 1243–1252.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I., Attention is all you need, Advances in Neural Information Processing Systems 30 (NIPS 2017), Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., Eds., Curran Associates, 2017, pp. 5998–6008.
Miculicich Werlen, L., Pappas, N., Ram, D., and Popescu-Belis, A., Self-attentive residual decoder for neural machine translation, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018, Walker, M., Ji, H., and Stent, A., Eds., New Orleans: Association for Computational Linguistics, 2018, vol. 1, pp. 1366–1379. https://doi.org/10.18653/v1/N18-1124
Wang, C., Li, M., and Smola, A., Language models with Transformers, 2019. arXiv:1904.09408 [cs.CL]
So, D.R., Le, Q., and Liang, C., The evolved Transformer, Proc. Mach. Learn. Res., 2019, vol. 97, pp. 5877–5886. arXiv:1901.11117, 2019.
Zhang, D., Crego, J., and Senellart, J., Analyzing knowledge distillation in neural machine translation, Proc. 15th Int. Workshop on Spoken Language Translation, Bruges, Belgium, 2018, Turchi, M., Niehues, J., and Frederico, M., Eds., pp. 23–30.
Dong, D., Wu, H., He, W., Yu, D., and Wang, H. Multi-task learning for multiple language translation, Proc. 53rd Ann. Meeting of the Association for Computational Linguistics and the 7th International Joint Conf. on Natural Language Processing, Beijing, 2015, Zong, C. and Strube, M., Eds., Beijing: Association for Computational Linguistics, 2015, vol. 1, pp. 1723–1732. https://doi.org/10.3115/v1/P15-1166
Lee, J., Cho, K., and Hofmann, T., Fully character-level neural machine translation without explicit segmentation, Trans. Assoc. Comput. Linguist., 2017, vol. 5, pp. 365–378. https://doi.org/10.1162/tacl_a_00067
Firat, O., Cho, K., and Bengio, Y., Multi-way, multilingual neural machine translation with a shared attention mechanism, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 866–875. https://doi.org/10.18653/v1/N16-1101
Zoph, B. and Knight, K., Multi-source neural translation, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 30–34. https://doi.org/10.18653/v1/N16-1004
Sennrich, R. and Zhang, B., Revisiting low-resource neural machine translation: A case study, Proc. 57th Ann. Meeting of the Association for Computational Linguistics, Florence, 2019, Korhonen, A., Traum, D., and Màrquez, L., Eds., Florence: Association for Computational Linguistics, 2019, pp. 211–221. https://doi.org/10.18653/v1/P19-1021
Collobert, R. and Weston, J., A unified architecture for natural language processing: Deep neural networks with multitask learning, Proc. 25th Int. Conf. on Machine Learning, Helsinki, 2008, New York: Association for Computing Machinery, 2008, pp. 160–167. https://doi.org/10.1145/1390156.1390177
Dai, A.M. and Le, Q.V., Semi-supervised sequence learning, Advances in Neural Information Processing Systems, Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., Eds., Curran Associates, 2015.
Klerke, S., Goldberg, Y., and Søgaard, A., Improving sentence compression by learning to predict gaze, Proc. 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Calif., 2016, Knight, K., Nenkova, A., and Rambow, O., Eds., San Diego, Calif.: Association for Computational Linguistics, 2016, pp. 1528–1533. https://doi.org/10.18653/v1/N16-1179
Setiawan, H., Huang, Z., Devlin, J., Lamar, T., Zbib, R., Schwartz, R., and Makhoul, J., Statistical machine translation features with multitask tensor networks, Proc. 53rd Ann. Meeting of the Association for Computational Linguistics and the 7th International Joint Conf. on Natural Language Processing, Beijing, 2015, Zong, C. and Strube, M., Eds., Beijing: Association for Computational Linguistics, 2015, vol. 1, pp. 31–41. https://doi.org/10.3115/v1/P15-1004
Luong, M.-T., Le, Q.V., Sutskever, I., Vinyals, O., and Kaiser, L., Multi-task sequence to sequence learning, 2015. arXiv:1511.06114 [cs.LG]
Sutskever, I., Vinyals, O., and Le, Q.V., Proc. 27th Int. Conf. on Neural Information Processing Systems, Montreal, 2014, Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q., Eds., Cambridge, Mass.: MIT Press, 2014, vol. 2, pp. 3104–3112.
Chen, B. and Huang, F., Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data, Proc. 20th SIGNLL Conf. on Computational Natural Language Learning, Berlin, 2016, Riezler, S. and Goldberg, Y., Eds., Berlin: Association for Computational Linguistics, 2016, pp. 314–323. https://doi.org/10.18653/v1/K16-1031
Poncelas, A., Maillette de Buy Wenniger, G., and Way, A., Feature decay algorithms for neural machine translation, Proc. 21st Ann. Conf. of the European Association for Machine Translation, Alacant, Spain, 2018, Pérez-Ortiz, J.A., Sánchez-Martínez, F., Esplà-Gomis, M., Popović, M., Rico, C., Martins, A., Van den Bogaert, J., and Forcada, M.L., Alacant, Spain: Univ. d’Alacant, 2018, pp. 239–248.
Poncelas, A., Sarasola, K., and Way, A., The ADAPT system description for the IWSLT 2018 Basque to English translation task, Proc. 15th Int. Workshop on Spoken Language Translation, Bruges, Belgium, 2018, Turchi, M., Niehues, J., and Frederico, M., Eds., pp. 76–82.
Wang, R., Finch, A., Utiyama, M., and Sumita, E., Sentence embedding for neural machine translation domain adaptation, Proc. 55th Ann. Meeting of the Association for Computational Linguistics, Vancouver, 2017, Barzilay, R. and Kan, M.-Y., Eds., Vancouver: Association for Computational Linguistics, 2017, vol. 2, pp. 560–566. https://doi.org/10.18653/v1/P17-2089
Joty, S., Sajjad, H., Durrani, N., Al-Mannai, K., Abdelali, A., and Vogel, S., How to avoid unwanted pregnancies: Domain adaptation using neural network models, Proc. 2015 Conf. on Empirical Methods in Natural Language Processing, Lisbon, 2015, Màrquez, L., Callison-Burch, C., and Su, J., Eds., Lisbon: Association for Computational Linguistics, 2015, pp. 1259–1270. https://doi.org/10.18653/v1/D15-1147
Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., Liu, G., and Liu, T.-Y., Dual transfer learning for neural machine translation with marginal distribution regularization, The Thirty-Second AAAI Conf. on Artificial Intelligence (AAAI-2018), pp. 5553–5560.
Chen, B., Cherry, C., Foster, G., and Larkin, S., Cost weighting for neural machine translation domain adaptation, Proc. First Workshop on Neural Machine Translation, Vancouver, 2017, Luong, T., Birch, A., Neubig, G., and Finch, A., Eds., Vancouver: Association for Computational Linguistics, 2017, pp. 40–46. https://doi.org/10.18653/v1/W17-3205
Wang, R., Utiyama, M., Liu, L., Chen, K., and Sumita, E., Instance weighting for neural machine translation domain adaptation, Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, 2017, Palmer, M., Hwa, R., and Riedel, S., Eds., Copenhagen: Association for Computational Linguistics, 2017, pp. 1482–1488. https://doi.org/10.18653/v1/D17-1155
Li, R., Wang, X., and Yu, H., MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation, Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 5, pp. 8245–8252. https://doi.org/10.1609/aaai.v34i05.6339
Thompson, B., Khayrallah, H., Anastasopoulos, A., McCarthy, A.D., Duh, K., Marvin, R., McNamee, P., Gwinnup, J., Anderson, T., and Koehn, P., Freezing subnetworks to analyze domain adaptation in neural machine translation, Proc. Third Conf. on Machine Translation: Research Papers, Brussels, 2018, Brussels: Association for Computational Linguistics, 2018, pp. 124–132. https://doi.org/10.18653/v1/W18-6313
Vilar, D., Learning hidden unit contribution for adapting neural machine translation models, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018, Walker, M., Ji, H., and Stent, A., Eds., New Orleans: Association for Computational Linguistics, 2018, pp. 500–505. https://doi.org/10.18653/v1/N18-2080
Finn, C., Abbeel, P., and Levine, S., Model-agnostic meta-learning for fast adaptation of deep networks, Proc. 34th Int. Conf. on Machine Learning, Sydney, 2017, Precup, D. and Teh, Y.W., Eds., JMLR.org, 2017, pp. 1126–1135.
Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z., and Liu, T.-Y., Multilingual neural machine translation with knowledge distillation, 2019. arXiv:1902.10461 [cs.CL]
Xu, Y., Qiu, X., Zhou, L., and Huang, X., Improving BERT fine-tuning via self-ensemble and self-distillation, 2020. arXiv:2002.10345 [cs.CL]
Zeng, J., Liu, Y., Su, J., et al., Iterative dual domain adaptation for neural machine translation, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 2019, Inui, K., Jiang, J., Ng, V., and Wan, X., Eds., Hong Kong: Association for Computational Linguistics, 2019, pp. 845–855. https://doi.org/10.18653/v1/D19-1078
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I.W., and Sugiyama, M., Co-teaching: Robust training of deep neural networks with extremely noisy labels, Proc. 32nd Int. Conf. on Neural Information Processing Systems, Montreal, 2018, Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., and Cesa-Bianchi, N., Eds., Red Hook, N.Y.: Curran Associates, 2018, pp. 8536–8546.
Sennrich, R., Haddow, B., and Birch, A., Neural machine translation of rare words with subword units, Proc. 54th Ann. Meeting of the Association for Computational Linguistics, Berlin, 2016, Erk, K. and Smith, N.A., Eds., Berlin: Association for Computational Linguistics, 2016, vol. 1, pp. 1715–1725. https://doi.org/10.18653/v1/P16-1162
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., BLEU: A method for automatic evaluation of machine translation, Proc. 40th Ann. Meeting on Association for Computational Linguistics, Philadelphia, 2002, Stroudsburg, Pa.: Association for Computational Linguistics, 2002, pp. 311–318. https://doi.org/10.3115/1073083.1073135
Koehn, P., Statistical significance tests for machine translation evaluation, Proc. 2004 Conf. on Empirical Methods in Natural Language Processing, Barcelona, 2004, Lin, D. and Wu, D., Eds., Barcelona: Association for Computational Linguistics, 2004, pp. 388–395.
Tian, L., Wong, D.F., Chao, L.S., Quaresma, P., Oliveira, F., Lu, Y., Li, S., Wang, Y., and Wang, L., UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation, Proc. Ninth Int. Conf. on Language Resources and Evaluation (LREC’14), Reykjavik, 2014, Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. Eds., Reykjavik: European Language Resources Association, 2014, pp. 1837–1842.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no conflicts of interest.
About this article
Cite this article
Yupeng Liu, Zhang, Y. & Zhang, X. Neural Translation System of Meta-Domain Transfer Based on Self-Ensemble and Self-Distillation. Aut. Control Comp. Sci. 56, 109–119 (2022). https://doi.org/10.3103/S0146411622020109
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411622020109