research-article

Leveraging emoji to improve sentiment classification of tweets

Authors:

Tiago Martinho de Barros,

Zanoni DiasAuthors Info & Claims

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 845 - 852

https://doi.org/10.1145/3412841.3441960

Published: 22 April 2021 Publication History

Abstract

Recent advances in the Natural Language Processing field have brought good results to a number of interesting tasks, for instance, Linguistic Acceptability, Question Answering, Reading Comprehension, Natural Language Inference, and Sentiment Analysis. Methods, such as ULMFiT, ELMo, BERT, and their derivatives, have achieved increasing success with these tasks, but often requiring substantial amounts of pre-training data and computational resources. We propose a novel methodology to classify the sentiment of tweets, based on BERT but focusing on emojis, treating them as an important source of sentiment as opposed to considering them simple input tokens. Additionally, it is possible to use a previously pre-trained BERT model to warm start ours, greatly reducing the training time required. Experiments on two Brazilian Portuguese datasets - TweetSentBR and 2000-tweets-BR - show that our methodology produces better results than BERT and outperforms the previously published results for both datasets, thus establishing new state-of-the-art results on TweetSentBR with accuracy of 0.7577 (4.8 percentage points absolute improvement) and F₁ score of 0.7395 (8.4 percentage points absolute improvement); and on 2000-tweets-BR with accuracy of 0.8316 (15.2 percentage points absolute improvement) and F₁ score of 0.8151 (24.5 percentage points absolute improvement).

References

[1]

Henrico Brum and Maria das Graças Volpe Nunes. 2018. Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In 11th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Miyazaki, Japan, 4167--4172.

[2]

Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2018. Semi-supervised Sentiment Annotation of Large Corpora. In 13th International Conference on Computational Processing of the Portuguese Language (PROPOR). Organization Committee of the International Conference on Computational Processing of the Portuguese Language (OC-PROPOR), Springer, Canela, Brazil, 385--395.

[3]

Andrew M Dai and Quoc V Le. 2015. Semi-Supervised Sequence Learning. In 29th Conference on Neural Information Processing Systems (NIPS). Neural Information Processing Systems (NIPS) Foundation, Curran Associates, Inc., Montréal, Canada, 3079--3087.

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In 20th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT): Long and Short Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, Minneapolis, USA, 4171--4186.

[5]

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter Sentiment Classification using Distant Supervision. CS22JfN project report, Stanford 1, 12 (2009), 1--6.

[6]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[7]

Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-Tuning for Text Classification. In 56th Annual Meeting of the Association for Computational Linguistics (ACL): Long Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, Melbourne, Australia, 328--339.

[8]

Nozomi Kobayashi, Kentaro Inui, and Yuji Matsumoto. 2007. Opinion Mining from Web Documents: Extraction and Structurization. Information and Media Technologies 2, 1 (2007), 326--337.

[9]

Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1--167.

[10]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations (ICLR). International Conference on Learning Representations (ICLR), ICLR, New Orleans, USA, 1--10.

[11]

Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. 2016. "Blissfully Happy" or "Ready to Fight": Varying Interpretations of Emoji. In 10th International Conference on Web and Social Media (ICWSM). Association for the Advancement of Artificial Intelligence (AAAI), AAAI Press, Cologne, Germany, 259--268.

[12]

Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima. 2002. Mining Product Reputations on the Web. In 8th International Conference on Knowledge Discovery and Data Mining (KDD). Association for Computing Machinery (ACM), ACM, Edmonton, Canada, 341--349.

[13]

Makoto Nakatsuji and Yasuhiro Fujiwara. 2014. Linked Taxonomies to Capture Users' Subjective Assessments of Items to Facilitate Accurate Collaborative Filtering. Artificial Intelligence 207 (2014), 52--68.

Digital Library

[14]

Paulo de Assis Nascimento. 2019. Aplicando Ensemble para Classificação de Textos Curtos em Português do Brasil. Master's thesis. Universidade Federal de Pernambuco, Recife, Brazil.

[15]

Tetsuya Nasukawa and Jeonghee Yi. 2003. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In 2nd International Conference on Knowledge Capture (K-CAP). Association for Computing Machinery (ACM), ACM, Florida, USA, 70--77.

[16]

Juri Opitz and Sebastian Burst. 2019. Macro F1 and Macro F1. Computing Research Repository abs/1911.03347 (2019), 1--12.

[17]

Alexander Pak and Patrick Paroubek. 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In 7th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Valletta, Malta, 1320--1326.

[18]

Bo Pang and Lillian Lee. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In 43rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics (ACL), ACL, Ann Arbor, USA, 115--124.

Digital Library

[19]

Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In 19th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT): Long Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, New Orleans, USA, 2227--2237.

[20]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical Report. OpenAI.

[21]

Kenzo Miranda Sakiyama, Andre Quintiliano Bezerra Silva, and Edson Takashi Matsubara. 2019. Twitter Breaking News Detector in the 2018 Brazilian Presidential Election using Word Embeddings and Convolutional Neural Networks. In 37th International Joint Conference on Neural Networks (IJCNN). Institute of Electrical and Electronics Engineers (IEEE), IEEE, Budapest, Hungary, 1--8.

[22]

Fabio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2019. Portuguese Named Entity Recognition using BERT-CRF. Computing Research Repository abs/1909.10649 (2019), 1--8.

[23]

Wilson L Taylor. 1953. "Cloze Procedure": A New Tool for Measuring Readability. Journalism Quarterly 30, 4 (1953), 415--433.

[24]

Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on Mining Subjective Data on the Web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.

Digital Library

[25]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In 31st Conference on Neural Information Processing Systems (NIPS). Neural Information Processing Systems (NIPS) Foundation, Curran Associates, Inc., Long Beach, USA, 5998--6008.

[26]

Douglas Vitório, Ellen Souza, Ingryd Teles, and Adriano LI Oliveira. 2017. Investigating Opinion Mining through Language Varieties: a Case Study of Brazilian and European Portuguese tweets. In 11th Brazilian Symposium in Information and Human Language Technology (STIL). Sociedade Brasileira de Computação (SBC), SBC, Uberlândia, Brazil, 43--52.

[27]

Jorge A Wagner Filho, Rodrigo Wilkens, Marco Idiart, and Aline Villavicencio. 2018. The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In 11th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Miyazaki, Japan, 4339--4344.

[28]

Hao Wang, Doğan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan. 2012. A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. In 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics (ACL), ACL, Jeju, Republic of Korea, 115--120.

[29]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Computing Research Repository abs/1609.08144 (2016), 1--23.

[30]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In 33rd Conference on Neural Information Processing Systems (NeurIPS). Neural Information Processing Systems (NeurIPS) Foundation, Curran Associates, Inc., Vancouver, Canada, 5753--5763.

[31]

Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep Learning for Sentiment Analysis: A Survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.

Cited By

Hanny DResch B(2024)Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks ApproachInformation10.3390/info1504020015:4(200)Online publication date: 4-Apr-2024
https://doi.org/10.3390/info15040200
Xu QJayne CChang V(2024)An emoji feature-incorporated multi-view deep learning for explainable sentiment classification of social media reviewsTechnological Forecasting and Social Change10.1016/j.techfore.2024.123326202(123326)Online publication date: May-2024
https://doi.org/10.1016/j.techfore.2024.123326
Piau MLotufo RNogueira R(2024)ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese LanguageIntelligent Systems10.1007/978-3-031-79032-4_23(324-338)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-79032-4_23
Show More Cited By

Index Terms

Leveraging emoji to improve sentiment classification of tweets
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Data-Augmented Emoji Approach to Sentiment Classification of Tweets
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Abstract
The Natural Language Processing field has made great strides recently. As a result, many challenging tasks are being given better solutions. One of these tasks is Sentiment Analysis, which is the subject of this work. We propose a novel ... $_{}$ $_{}$
Evaluation of online emoji description resources for sentiment analysis purposes
Abstract
Emoji sentiment analysis is a relevant research topic nowadays, for which emoji sentiment lexica are key assets. Manual annotation affects directly their quality (where high quality usually corresponds to high self-...
Investigating the Consistency of Emoji Sentiment Lexicons Constructed Using Different Languages
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

Emojis have been widely used in recent text-based communications and can be important features for sentiment analysis of social media posts such as tweets. Our previous work presented a method for automatically constructing an emoji sentiment lexicon ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

March 2021

2075 pages

ISBN:9781450381048

DOI:10.1145/3412841

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CNPq
FAPESP

Conference

SAC '21

Sponsor:

SIGAPP

SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing

March 22 - 26, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hanny DResch B(2024)Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks ApproachInformation10.3390/info1504020015:4(200)Online publication date: 4-Apr-2024
https://doi.org/10.3390/info15040200
Xu QJayne CChang V(2024)An emoji feature-incorporated multi-view deep learning for explainable sentiment classification of social media reviewsTechnological Forecasting and Social Change10.1016/j.techfore.2024.123326202(123326)Online publication date: May-2024
https://doi.org/10.1016/j.techfore.2024.123326
Piau MLotufo RNogueira R(2024)ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese LanguageIntelligent Systems10.1007/978-3-031-79032-4_23(324-338)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-79032-4_23
Sankari SPriscila S(2024)A Deep Learning Based Emoticon Classification for Social Media Comment AnalysisAdvancements in Smart Computing and Information Security10.1007/978-3-031-59097-9_23(313-328)Online publication date: 2-May-2024
https://doi.org/10.1007/978-3-031-59097-9_23
Khan PRazzak IDengel AAhmed S(2023)Performance Comparison of Transformer-Based Models on Twitter Health Mention ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.314376810:3(1140-1149)Online publication date: Jun-2023
https://doi.org/10.1109/TCSS.2022.3143768
Vianna DCarneiro FCarvalho JPlastino APaes A(2023)Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation modelsLanguage Resources and Evaluation10.1007/s10579-023-09661-458:1(223-272)Online publication date: 28-Jun-2023
https://dl.acm.org/doi/10.1007/s10579-023-09661-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten