Skip to main content

Advertisement

Log in

Data augmentation for aspect-based sentiment analysis

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In recent years, deep learning has been widely used in the field of natural language processing (NLP), achieving spectacular successes in various NLP tasks. These successes are largely due to its capability to automatically learn feature representations from text data. However, the performance of deep learning in NLP can be negatively affected by a lack of sufficiently large labeled corpus for training, resulting in limited improvement in performance. Data augmentation overcomes this small data problem by expanding the sample size for the classes of data in the training corpus. This paper introduces the data augmentation for aspect-based sentiment analysis (ABSA), a classical research topic in NLP that has been applied in various fileds. The study aims to enhance the classification performance of ABSA through various augmentation strategies. Two specific augmentation strategies are presented, part-of-speech (PoS) wise synonym substitution (PWSS) and dependency relation-based word swap (DRAWS), which augment data using PoS, external domain knowledge, and syntactic dependency. These strategies are evaluated through extensive experimentation on four public datasets using three representative deep learning models—aspect-specific graph convolutional network (ASGCN), content attention-based aspect-based sentiment classification (CABASC), and long short-term memory (LSTM) network. Compared with the results without data augmentation, our augmentation strategies achieve a performance gain of up to 11.49% on Macro-F1, with the lowest gain being 2.9%. The experimental results demonstrate that the proposed data augmentation strategies are very useful for training deep learning models on small data corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://alt.qcri.org/semeval2014/task4/

  2. https://alt.qcri.org/semeval2014/task4/

  3. https://alt.qcri.org/semeval2015/task12/

  4. https://alt.qcri.org/semeval2016/task5/

  5. https://spacy.io

  6. https://git.io/JGcuK

References

  1. Liu Q, Zhang H, Zeng Y, Huang Z, Wu Z (2018) Content Attention Model for Aspect Based Sentiment Analysis, in: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp. 1023–1032. https://doi.org/10.1145/3178876.3186001

  2. Zhang C, Li Q, Song D Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks, arXiv:1909.03477 [cs] arXiv:1909.03477

  3. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  4. Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54

    Article  Google Scholar 

  5. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform Sci 250:113–141

    Article  Google Scholar 

  6. Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inform Sci 513:429–441

    Article  Google Scholar 

  7. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105

    Google Scholar 

  8. Wang J, Perez L The effectiveness of data augmentation in image classification using deep learning, Convolutional Neural Networks Vis. Recognit 11

  9. Singh J, McCann B, Keskar NS, Xiong C, Socher R XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering, arXiv:1905.11471 [cs] arXiv:1905.11471

  10. Min J, McCoy RT, Das D, Pitler E, Linzen T Syntactic data augmentation increases robustness to inference heuristics, arXiv preprint arXiv:2004.11999arXiv:2004.11999

  11. Sennrich R, Haddow B, Birch A Improving Neural Machine Translation Models with Monolingual Data, arXiv:1511.06709 [cs] arXiv:1511.06709

  12. Fadaee M, Bisazza A, Monz C (2017) Data Augmentation for Low-Resource Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 567–573 arXiv:1705.00440, https://doi.org/10.18653/v1/P17-2090

  13. Dai X, Adel H An Analysis of Simple Data Augmentation for Named Entity Recognition, arXiv:2010.11683 [cs] arXiv:2010.11683

  14. Fellbaum C (2012). The Encyclopedia of Applied Linguistics. https://doi.org/10.1002/9781405198431.wbeal1285

  15. Mueller J, Thyagarajan A (2016) Siamese recurrent architectures for learning sentence similarity, In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30

  16. Wei J, Zou K (2019) EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp. 6381–6387. https://doi.org/10.18653/v1/D19-1670

  17. Zhang X, Zhao J, LeCun Y Character-level convolutional networks for text classification, arXiv preprint arXiv:1509.01626arXiv:1509.01626

  18. Coulombe C Text data augmentation made simple by leveraging nlp cloud apis, arXiv preprint arXiv:1812.04718arXiv:1812.04718

  19. Luque FM Atalaya at tass 2019: Data augmentation and robust embeddings for sentiment analysis, arXiv preprint arXiv:1909.11241arXiv:1909.11241

  20. Zhang Y, Ge T, Sun X Parallel data augmentation for formality style transfer, arXiv preprint arXiv:2005.07522arXiv:2005.07522

  21. Xie Q, Dai Z, Hovy E, Luong M-T, Le QV Unsupervised data augmentation for consistency training, arXiv preprint arXiv:1904.12848arXiv:1904.12848

  22. Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text, in: International Conference on Machine Learning, PMLR, 2017, pp. 1587–1596

  23. Anaby-Tavor A, Carmeli B, Goldbraich E, Kantor A, Kour G, Shlomov S, Tepper N, Zwerdling N (2010) Do not have enough data? Deep learning to the rescue!. In: Proceedings of the AAAI Conference on Artificial Intelligence, 4:7383–7390

  24. Li K, Chen C, Quan X, Ling Q, Song Y Conditional augmentation for aspect term extraction via masked sequence-to-sequence generation, arXiv preprint arXiv:2004.14769arXiv:2004.14769

  25. Kobayashi S Contextual augmentation: Data augmentation by words with paradigmatic relations, arXiv preprint arXiv:1805.06201arXiv:1805.06201

  26. Robinson JJ (1970) Dependency structures and transformational rules, Language 259–285

  27. Miao Z, Li Y, Wang X, Tan W-C (2010) Snippext: Semi-supervised opinion mining with augmented data. In: Proceedings of The Web Conference 2020, pp. 617–628

  28. Jeni LA, Cohn JF, De La Torre F (2013) Facing Imbalanced Data Recommendations for the Use of Performance Metrics, International Conference on Affective Computing and Intelligent Interaction and workshops : [proceedings]. ACII (Conference) 2013 245–251. https://doi.org/10.1109/ACII.2013.47

  29. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543

  30. Xu C, Wang H, Wu S, Lin Z (2021) Treelstm with tag-aware hypernetwork for sentence representation. Neurocomputing 434:11–20

    Article  Google Scholar 

Download references

Acknowledgements

We thank Xiang Dai[13] for the great suggestion. This research was supported by Natural Science Foundation of Hubei Province of China (Grant No. 2020CFB828), Hubei Normal University Research Project on Teaching Reform (Grant No. XJ202001), Teaching Research Project of Hubei Normal University (Grant No. 2019030), Research Project of Young Teachers in Hubei Normal University (Grant No. HS2020QN029) and Science and Technology Research Project of Hubei Department of Education (Grant No. D20212503).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangmin Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Wang, H., Ding, Y. et al. Data augmentation for aspect-based sentiment analysis. Int. J. Mach. Learn. & Cyber. 14, 125–133 (2023). https://doi.org/10.1007/s13042-022-01535-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01535-5

Keywords

Navigation