Data Diversification Revisited: Why Does It Work?

Song, Yuheng; Liu, Tianyi; Jia, Weijia

doi:10.1007/978-3-030-86365-4_42

Data Diversification Revisited: Why Does It Work?

Yuheng Song¹²,
Tianyi Liu¹² &
Weijia Jia^12,13

Conference paper
First Online: 07 September 2021

2409 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12893))

Abstract

Data Diversification is a recently proposed method of data augmentation for Neural Machine Translation (NMT). While it attracts broad attention due to its effectiveness, the reason for its success is unclear. In this paper, we first establish a connection between data diversification and knowledge distillation, and prove that data diversification reduces the modality complexity. We also find knowledge distillation has a lower complexity of data modality than data diversification, but challenging to boost performance. Our analysis reveals that knowledge distillation has a negative impact on the word frequency distribution where increasing rare words with unreliable representations. Furthermore, data diversification trains multiple models to further decrease the modality complexity, suffering from unbearable computational expenses. To reduce the computational cost, we propose adjustable sampling, which samples a model multiple times instead of training multiple models. Different from other sampling methods, our method introduces entropy to adjust the quality and diversity of the generated sentences, achieving the goal of reducing modality complexity and noise introduction. Extensive experimental results show our method dramatically reduces the computational cost of data diversification without loss of accuracy, and achieves improvements over other strong sampling methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://wit3.fbk.eu.
2.
https://github.com/fxsjy/jieba.

References

Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. In: ACL, pp. 567–573 (2017)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR (2018)
Google Scholar
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. TACL (2017)
Google Scholar
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: Proceedings of EMNLP, pp. 1317–1327 (2016)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: ACL, pp. 177–180 (2007)
Google Scholar
Nguyen, T.Q., Chiang, D.: Improving lexical choice in neural machine translation. In: Proceedings of NAACL, pp. 334–343 (2018)
Google Scholar
Nguyen, X., Joty, S.R., Wu, K., Aw, A.T.: Data diversification: a simple strategy for neural machine translation. In: NeurIPS (2020)
Google Scholar
Ott, M., Auli, M., Grangier, D., Ranzato, M.: Analyzing uncertainty in neural machine translation. In: ICLR, pp. 3956–3965 (2018)
Google Scholar
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: NAACL-HLT (2019)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)
Google Scholar
Post, M.: A call for clarity in reporting bleu scores. In: WMT 2018, p. 186 (2018)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of ACL, pp. 1715–1725 (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Google Scholar
Vijayakumar, A.K., et al.: Diverse beam search for improved description of complex scenes. In: AAAI (2018)
Google Scholar
Wang, Q., et al.: Learning deep transformer models for machine translation. In: Proceedings of ACL (2019)
Google Scholar
Wei, X., Yu, H., Hu, Y., Weng, R., Xing, L., Luo, W.: Uncertainty-aware semantic augmentation for neural machine translation. In: EMNLP (2020)
Google Scholar
Wu, F., Fan, A., Baevski, A., Dauphin, Y.N., Auli, M.: Pay less attention with lightweight and dynamic convolutions. In: ICLR (2019)
Google Scholar
Xu, M., Wong, D.F., Yang, B., Zhang, Y., Chao, L.S.: Leveraging local and global patterns for self-attention networks. In: Proceedings of ACL, pp. 3069–3075 (2019)
Google Scholar
Zhou, C., Gu, J., Neubig, G.: Understanding knowledge distillation in non-autoregressive machine translation. In: ICLR (2020)
Google Scholar
Zhu, J., et al.: Soft contextual data augmentation for neural machine translation. arXiv preprint arXiv:1905.10523 (2019)

Download references

Acknowledgements

This work is supported by Guangdong Key Lab of AI and Multi-modal Data Processing, Chinese National Research Fund (NSFC) Project No. 61872239; BNU-UIC Institute of Artificial Intelligence and Future Networks funded by Beijing Normal University (Zhuhai) and AI-DS Research Hub, BNU-HKBU United International College (UIC), Zhuhai, Guangdong, China.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Yuheng Song, Tianyi Liu & Weijia Jia
BNU-UIC Institute of Artificial Intelligence and Future Networks Beijing Normal University (BNU Zhuhai) Guangdong Key Lab of AI and Multi-Modal Data Processing, BNU-HKBU United International College Zhuhai, Guangdong, People’s Republic of China
Weijia Jia

Authors

Yuheng Song
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weijia Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weijia Jia .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Y., Liu, T., Jia, W. (2021). Data Diversification Revisited: Why Does It Work?. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-86365-4_42
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86364-7
Online ISBN: 978-3-030-86365-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics