Skip to main content

Data-Augmented Emoji Approach to Sentiment Classification of Tweets

  • Conference paper
  • First Online:
  • 713 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12702))

Abstract

The Natural Language Processing field has made great strides recently. As a result, many challenging tasks are being given better solutions. One of these tasks is Sentiment Analysis, which is the subject of this work. We propose a novel methodology to classify the sentiment of tweets, based on BERT and focusing on emoji. Our method also employs data augmentation to improve its generalization ability. Experiments on two Brazilian Portuguese datasets – TweetSentBR and 2000-tweets-BR – show that our methodology produces better results than BERT and outperforms the previously published results for TweetSentBR, with accuracy of 0.7726 (6.3 percentage points (p.p.) of improvement) and F\(_{1}\) score of 0.7514 (9.5 p.p. of improvement), as well as for 2000-tweets-BR, with accuracy of 0.8247 (14.5 p.p. of improvement) and F\(_{1}\) score of 0.8035 (23.4 p.p. of improvement).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Jessica is beautiful and has a sense of humor.

  2. 2.

    When was that? Didn’t she win with a gnocchi recipe?

  3. 3.

    That’s terrible... it’s like a horror freak show.

  4. 4.

    Emoticons considered: \(\texttt {S2\;\;{<}3\;\,{;}D\;\,{:}D\;\;{;}-)\;\;{:}-)\;\,{;})\;\,{=})\;\;{:})\;\;{;}-(\;\;{:}-(\,\,{;}(\;\,{=}(\;\,{:}(}\).

References

  1. Brum, H.B., Nunes, M.G.V.: Building a sentiment corpus of tweets in Brazilian Portuguese. In: 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, pp. 4167–4172. European Language Resources Association (ELRA), ELRA (2018)

    Google Scholar 

  2. Brum, H.B., Nunes, M.G.V.: Semi-supervised sentiment annotation of large corpora. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 385–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_39

    Chapter  Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 20th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, USA, pp. 4171–4186. Association for Computational Linguistics (ACL), ACL (2019)

    Google Scholar 

  4. Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. Computing Research Repository, pp. 1–19 (2020)

    Google Scholar 

  5. Nascimento, P.A.: Aplicando Ensemble para Classificação de Textos Curtos em Português do Brasil. Master’s thesis, Universidade Federal de Pernambuco, Recife, Brazil (2019)

    Google Scholar 

  6. Sakiyama, K.M., Silva, A.Q.B., Matsubara, E.T.: Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks. In: 37th International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, pp. 1–8. Institute of Electrical and Electronics Engineers (IEEE), IEEE (2019)

    Google Scholar 

  7. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28

    Chapter  Google Scholar 

  8. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, USA, pp. 5998–6008. Neural Information Processing Systems (NIPS) Foundation, Curran Associates Inc. (2017)

    Google Scholar 

  9. Vitório, D., Souza, E., Teles, I., Oliveira, A.L.: Investigating opinion mining through language varieties: a case study of Brazilian and European Portuguese tweets. In: 11th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 43–52. Sociedade Brasileira de Computação (SBC), SBC, Uberlândia (2017)

    Google Scholar 

  10. Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: 24th Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, pp. 6383–6389. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank FAPESP (grants #2015/11937-9, #2017/12646-3, #2017/16246-0, #2017/12646-3 and #2019/20875-8), CNPq (grants #304380/2018-0 and #309330/2018-1) and CAPES for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helio Pedrini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Barros, T.M., Pedrini, H., Dias, Z. (2021). Data-Augmented Emoji Approach to Sentiment Classification of Tweets. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93420-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93419-4

  • Online ISBN: 978-3-030-93420-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics