Abstract
Data Diversification is a recently proposed method of data augmentation for Neural Machine Translation (NMT). While it attracts broad attention due to its effectiveness, the reason for its success is unclear. In this paper, we first establish a connection between data diversification and knowledge distillation, and prove that data diversification reduces the modality complexity. We also find knowledge distillation has a lower complexity of data modality than data diversification, but challenging to boost performance. Our analysis reveals that knowledge distillation has a negative impact on the word frequency distribution where increasing rare words with unreliable representations. Furthermore, data diversification trains multiple models to further decrease the modality complexity, suffering from unbearable computational expenses. To reduce the computational cost, we propose adjustable sampling, which samples a model multiple times instead of training multiple models. Different from other sampling methods, our method introduces entropy to adjust the quality and diversity of the generated sentences, achieving the goal of reducing modality complexity and noise introduction. Extensive experimental results show our method dramatically reduces the computational cost of data diversification without loss of accuracy, and achieves improvements over other strong sampling methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. In: ACL, pp. 567–573 (2017)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR (2018)
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. TACL (2017)
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: Proceedings of EMNLP, pp. 1317–1327 (2016)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: ACL, pp. 177–180 (2007)
Nguyen, T.Q., Chiang, D.: Improving lexical choice in neural machine translation. In: Proceedings of NAACL, pp. 334–343 (2018)
Nguyen, X., Joty, S.R., Wu, K., Aw, A.T.: Data diversification: a simple strategy for neural machine translation. In: NeurIPS (2020)
Ott, M., Auli, M., Grangier, D., Ranzato, M.: Analyzing uncertainty in neural machine translation. In: ICLR, pp. 3956–3965 (2018)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: NAACL-HLT (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)
Post, M.: A call for clarity in reporting bleu scores. In: WMT 2018, p. 186 (2018)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of ACL, pp. 1715–1725 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Vijayakumar, A.K., et al.: Diverse beam search for improved description of complex scenes. In: AAAI (2018)
Wang, Q., et al.: Learning deep transformer models for machine translation. In: Proceedings of ACL (2019)
Wei, X., Yu, H., Hu, Y., Weng, R., Xing, L., Luo, W.: Uncertainty-aware semantic augmentation for neural machine translation. In: EMNLP (2020)
Wu, F., Fan, A., Baevski, A., Dauphin, Y.N., Auli, M.: Pay less attention with lightweight and dynamic convolutions. In: ICLR (2019)
Xu, M., Wong, D.F., Yang, B., Zhang, Y., Chao, L.S.: Leveraging local and global patterns for self-attention networks. In: Proceedings of ACL, pp. 3069–3075 (2019)
Zhou, C., Gu, J., Neubig, G.: Understanding knowledge distillation in non-autoregressive machine translation. In: ICLR (2020)
Zhu, J., et al.: Soft contextual data augmentation for neural machine translation. arXiv preprint arXiv:1905.10523 (2019)
Acknowledgements
This work is supported by Guangdong Key Lab of AI and Multi-modal Data Processing, Chinese National Research Fund (NSFC) Project No. 61872239; BNU-UIC Institute of Artificial Intelligence and Future Networks funded by Beijing Normal University (Zhuhai) and AI-DS Research Hub, BNU-HKBU United International College (UIC), Zhuhai, Guangdong, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, Y., Liu, T., Jia, W. (2021). Data Diversification Revisited: Why Does It Work?. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-86365-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86364-7
Online ISBN: 978-3-030-86365-4
eBook Packages: Computer ScienceComputer Science (R0)