Abstract
The Natural Language Processing field has made great strides recently. As a result, many challenging tasks are being given better solutions. One of these tasks is Sentiment Analysis, which is the subject of this work. We propose a novel methodology to classify the sentiment of tweets, based on BERT and focusing on emoji. Our method also employs data augmentation to improve its generalization ability. Experiments on two Brazilian Portuguese datasets – TweetSentBR and 2000-tweets-BR – show that our methodology produces better results than BERT and outperforms the previously published results for TweetSentBR, with accuracy of 0.7726 (6.3 percentage points (p.p.) of improvement) and F\(_{1}\) score of 0.7514 (9.5 p.p. of improvement), as well as for 2000-tweets-BR, with accuracy of 0.8247 (14.5 p.p. of improvement) and F\(_{1}\) score of 0.8035 (23.4 p.p. of improvement).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Jessica is beautiful and has a sense of humor.
- 2.
When was that? Didn’t she win with a gnocchi recipe?
- 3.
That’s terrible... it’s like a horror freak show.
- 4.
Emoticons considered: \(\texttt {S2\;\;{<}3\;\,{;}D\;\,{:}D\;\;{;}-)\;\;{:}-)\;\,{;})\;\,{=})\;\;{:})\;\;{;}-(\;\;{:}-(\,\,{;}(\;\,{=}(\;\,{:}(}\).
References
Brum, H.B., Nunes, M.G.V.: Building a sentiment corpus of tweets in Brazilian Portuguese. In: 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, pp. 4167–4172. European Language Resources Association (ELRA), ELRA (2018)
Brum, H.B., Nunes, M.G.V.: Semi-supervised sentiment annotation of large corpora. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 385–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_39
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 20th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, USA, pp. 4171–4186. Association for Computational Linguistics (ACL), ACL (2019)
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. Computing Research Repository, pp. 1–19 (2020)
Nascimento, P.A.: Aplicando Ensemble para Classificação de Textos Curtos em Português do Brasil. Master’s thesis, Universidade Federal de Pernambuco, Recife, Brazil (2019)
Sakiyama, K.M., Silva, A.Q.B., Matsubara, E.T.: Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks. In: 37th International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, pp. 1–8. Institute of Electrical and Electronics Engineers (IEEE), IEEE (2019)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, USA, pp. 5998–6008. Neural Information Processing Systems (NIPS) Foundation, Curran Associates Inc. (2017)
Vitório, D., Souza, E., Teles, I., Oliveira, A.L.: Investigating opinion mining through language varieties: a case study of Brazilian and European Portuguese tweets. In: 11th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 43–52. Sociedade Brasileira de Computação (SBC), SBC, Uberlândia (2017)
Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: 24th Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, pp. 6383–6389. Association for Computational Linguistics (ACL) (2019)
Acknowledgements
The authors would like to thank FAPESP (grants #2015/11937-9, #2017/12646-3, #2017/16246-0, #2017/12646-3 and #2019/20875-8), CNPq (grants #304380/2018-0 and #309330/2018-1) and CAPES for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Barros, T.M., Pedrini, H., Dias, Z. (2021). Data-Augmented Emoji Approach to Sentiment Classification of Tweets. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-93420-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)