Skip to main content

Data Diversification Revisited: Why Does It Work?

  • Conference paper
  • First Online:
  • 2409 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12893))

Abstract

Data Diversification is a recently proposed method of data augmentation for Neural Machine Translation (NMT). While it attracts broad attention due to its effectiveness, the reason for its success is unclear. In this paper, we first establish a connection between data diversification and knowledge distillation, and prove that data diversification reduces the modality complexity. We also find knowledge distillation has a lower complexity of data modality than data diversification, but challenging to boost performance. Our analysis reveals that knowledge distillation has a negative impact on the word frequency distribution where increasing rare words with unreliable representations. Furthermore, data diversification trains multiple models to further decrease the modality complexity, suffering from unbearable computational expenses. To reduce the computational cost, we propose adjustable sampling, which samples a model multiple times instead of training multiple models. Different from other sampling methods, our method introduces entropy to adjust the quality and diversity of the generated sentences, achieving the goal of reducing modality complexity and noise introduction. Extensive experimental results show our method dramatically reduces the computational cost of data diversification without loss of accuracy, and achieves improvements over other strong sampling methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://wit3.fbk.eu.

  2. 2.

    https://github.com/fxsjy/jieba.

References

  1. Fadaee, M., Bisazza, A., Monz, C.: Data augmentation for low-resource neural machine translation. In: ACL, pp. 567–573 (2017)

    Google Scholar 

  2. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)

  3. Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR (2018)

    Google Scholar 

  4. Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. TACL (2017)

    Google Scholar 

  5. Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: Proceedings of EMNLP, pp. 1317–1327 (2016)

    Google Scholar 

  6. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: ACL, pp. 177–180 (2007)

    Google Scholar 

  7. Nguyen, T.Q., Chiang, D.: Improving lexical choice in neural machine translation. In: Proceedings of NAACL, pp. 334–343 (2018)

    Google Scholar 

  8. Nguyen, X., Joty, S.R., Wu, K., Aw, A.T.: Data diversification: a simple strategy for neural machine translation. In: NeurIPS (2020)

    Google Scholar 

  9. Ott, M., Auli, M., Grangier, D., Ranzato, M.: Analyzing uncertainty in neural machine translation. In: ICLR, pp. 3956–3965 (2018)

    Google Scholar 

  10. Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: NAACL-HLT (2019)

    Google Scholar 

  11. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)

    Google Scholar 

  12. Post, M.: A call for clarity in reporting bleu scores. In: WMT 2018, p. 186 (2018)

    Google Scholar 

  13. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016)

    Google Scholar 

  14. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of ACL, pp. 1715–1725 (2016)

    Google Scholar 

  15. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27, 3104–3112 (2014)

    Google Scholar 

  16. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  17. Vijayakumar, A.K., et al.: Diverse beam search for improved description of complex scenes. In: AAAI (2018)

    Google Scholar 

  18. Wang, Q., et al.: Learning deep transformer models for machine translation. In: Proceedings of ACL (2019)

    Google Scholar 

  19. Wei, X., Yu, H., Hu, Y., Weng, R., Xing, L., Luo, W.: Uncertainty-aware semantic augmentation for neural machine translation. In: EMNLP (2020)

    Google Scholar 

  20. Wu, F., Fan, A., Baevski, A., Dauphin, Y.N., Auli, M.: Pay less attention with lightweight and dynamic convolutions. In: ICLR (2019)

    Google Scholar 

  21. Xu, M., Wong, D.F., Yang, B., Zhang, Y., Chao, L.S.: Leveraging local and global patterns for self-attention networks. In: Proceedings of ACL, pp. 3069–3075 (2019)

    Google Scholar 

  22. Zhou, C., Gu, J., Neubig, G.: Understanding knowledge distillation in non-autoregressive machine translation. In: ICLR (2020)

    Google Scholar 

  23. Zhu, J., et al.: Soft contextual data augmentation for neural machine translation. arXiv preprint arXiv:1905.10523 (2019)

Download references

Acknowledgements

This work is supported by Guangdong Key Lab of AI and Multi-modal Data Processing, Chinese National Research Fund (NSFC) Project No. 61872239; BNU-UIC Institute of Artificial Intelligence and Future Networks funded by Beijing Normal University (Zhuhai) and AI-DS Research Hub, BNU-HKBU United International College (UIC), Zhuhai, Guangdong, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weijia Jia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, Y., Liu, T., Jia, W. (2021). Data Diversification Revisited: Why Does It Work?. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86365-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86364-7

  • Online ISBN: 978-3-030-86365-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics