Data-Augmented Emoji Approach to Sentiment Classification of Tweets

de Barros, Tiago Martinho; Pedrini, Helio; Dias, Zanoni

doi:10.1007/978-3-030-93420-0_7

Data-Augmented Emoji Approach to Sentiment Classification of Tweets

Tiago Martinho de Barros¹¹,
Helio Pedrini¹¹ &
Zanoni Dias¹¹

Conference paper
First Online: 13 January 2022

713 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12702))

Abstract

The Natural Language Processing field has made great strides recently. As a result, many challenging tasks are being given better solutions. One of these tasks is Sentiment Analysis, which is the subject of this work. We propose a novel methodology to classify the sentiment of tweets, based on BERT and focusing on emoji. Our method also employs data augmentation to improve its generalization ability. Experiments on two Brazilian Portuguese datasets – TweetSentBR and 2000-tweets-BR – show that our methodology produces better results than BERT and outperforms the previously published results for TweetSentBR, with accuracy of 0.7726 (6.3 percentage points (p.p.) of improvement) and F\(_{1}\) score of 0.7514 (9.5 p.p. of improvement), as well as for 2000-tweets-BR, with accuracy of 0.8247 (14.5 p.p. of improvement) and F\(_{1}\) score of 0.8035 (23.4 p.p. of improvement).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Jessica is beautiful and has a sense of humor.
2.
When was that? Didn’t she win with a gnocchi recipe?
3.
That’s terrible... it’s like a horror freak show.
4.
Emoticons considered: \(\texttt {S2\;\;{<}3\;\,{;}D\;\,{:}D\;\;{;}-)\;\;{:}-)\;\,{;})\;\,{=})\;\;{:})\;\;{;}-(\;\;{:}-(\,\,{;}(\;\,{=}(\;\,{:}(}\).

References

Brum, H.B., Nunes, M.G.V.: Building a sentiment corpus of tweets in Brazilian Portuguese. In: 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, pp. 4167–4172. European Language Resources Association (ELRA), ELRA (2018)
Google Scholar
Brum, H.B., Nunes, M.G.V.: Semi-supervised sentiment annotation of large corpora. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 385–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_39
Chapter Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 20th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, USA, pp. 4171–4186. Association for Computational Linguistics (ACL), ACL (2019)
Google Scholar
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. Computing Research Repository, pp. 1–19 (2020)
Google Scholar
Nascimento, P.A.: Aplicando Ensemble para Classificação de Textos Curtos em Português do Brasil. Master’s thesis, Universidade Federal de Pernambuco, Recife, Brazil (2019)
Google Scholar
Sakiyama, K.M., Silva, A.Q.B., Matsubara, E.T.: Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks. In: 37th International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, pp. 1–8. Institute of Electrical and Electronics Engineers (IEEE), IEEE (2019)
Google Scholar
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, USA, pp. 5998–6008. Neural Information Processing Systems (NIPS) Foundation, Curran Associates Inc. (2017)
Google Scholar
Vitório, D., Souza, E., Teles, I., Oliveira, A.L.: Investigating opinion mining through language varieties: a case study of Brazilian and European Portuguese tweets. In: 11th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 43–52. Sociedade Brasileira de Computação (SBC), SBC, Uberlândia (2017)
Google Scholar
Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: 24th Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, pp. 6383–6389. Association for Computational Linguistics (ACL) (2019)
Google Scholar

Download references

Acknowledgements

The authors would like to thank FAPESP (grants #2015/11937-9, #2017/12646-3, #2017/16246-0, #2017/12646-3 and #2019/20875-8), CNPq (grants #304380/2018-0 and #309330/2018-1) and CAPES for their financial support.

Author information

Authors and Affiliations

University of Campinas, Institute of Computing, Campinas, SP, Brazil
Tiago Martinho de Barros, Helio Pedrini & Zanoni Dias

Authors

Tiago Martinho de Barros
View author publications
You can also search for this author in PubMed Google Scholar
Helio Pedrini
View author publications
You can also search for this author in PubMed Google Scholar
Zanoni Dias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helio Pedrini .

Editor information

Editors and Affiliations

Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Universidade Estadual Paulista, São Paulo, Brazil
João Paulo Papa
University of the Balearic Islands, Palma de Mallorca, Spain
Manuel González Hidalgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Barros, T.M., Pedrini, H., Dias, Z. (2021). Data-Augmented Emoji Approach to Sentiment Classification of Tweets. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-93420-0_7
Published: 13 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)