Skip to main content

Data Augmentation with Transformers for Text Classification

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2020)

Abstract

The current deep learning revolution has established to transformer based architectures as the state of the art in several natural language processing tasks. However, it is not clear whether such models can be also used for enhancing other aspects of the learning pipeline in the NLP context. This paper presents a study in such a direction, in particular, we explore the suitability of transformer models as a data augmentation mechanism for text classification. We introduce four ways of using transformer models for augmenting data in text classification. Each of these variants take the outputs of a transformer model, feed with training documents, and use such outputs as additional training data. The proposed strategies are evaluated in benchmark data using CNN and LSTM based classifiers. Experimental results are promising: improvements over a model training on the plain documents are consistent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agirre, E., Arregi, X., Otegi, A.: Document expansion based on WordNet for robust IR. In: Coling 2010: Posters, pp. 9–17 (2010)

    Google Scholar 

  2. Anaby-Tavor, A., et al.: Not enough data? Deep learning to the rescue! arXiv preprint 1911.03118 (2019)

  3. Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  4. Cabrera, J.M., Escalante, H.J., Montes-y-Gómez, M.: Distributional term representations for short-text categorization. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 335–346. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_28

    Chapter  Google Scholar 

  5. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. ArXiv preprint 1805.09501 (2018)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the NAACL, pp. 4171–4186, June 2019

    Google Scholar 

  7. Gong, Z., Cheang, C.W., Hou U, L.: Multi-term web query expansion using WordNet. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 379–388. Springer, Heidelberg (2006). https://doi.org/10.1007/11827405_37

    Chapter  Google Scholar 

  8. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. ArXiv preprint 1801.06146 (2018)

  9. Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. In: Proceedings of the 2018 Conference of NAACL, pp. 452–457 (2018)

    Google Scholar 

  10. Kumar, V., Choudhary, A., Cho, E.: Data augmentation using pre-trained transformer models. ArXiv preprint 2003.02245 (2020)

  11. Lavelli, A., Sebastiani, F., Zanoli, R.: Distributional term representations: an experimental comparison. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 615–624. ACM (2004)

    Google Scholar 

  12. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Technical report (2019)

    Google Scholar 

  13. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019). https://doi.org/10.1186/s40537-019-0197-0

    Article  Google Scholar 

  14. Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

  15. Wang, W.Y., Yang, D.: That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2557–2563 (2015)

    Google Scholar 

  16. Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing, pp. 6382–6388. Association for Computational Linguistics (2019)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by CONACyT under project grant A1-S-26314, Integración de Visión y Lenguaje mediante Representaciones Multimodales Aprendidas para Clasificación y Recuperación de Imágenes y Videos.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Jair Escalante .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tapia-Téllez, J.M., Escalante, H.J. (2020). Data Augmentation with Transformers for Text Classification. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60887-3_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60886-6

  • Online ISBN: 978-3-030-60887-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics