skip to main content
10.1145/3412841.3441960acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Leveraging emoji to improve sentiment classification of tweets

Published: 22 April 2021 Publication History

Abstract

Recent advances in the Natural Language Processing field have brought good results to a number of interesting tasks, for instance, Linguistic Acceptability, Question Answering, Reading Comprehension, Natural Language Inference, and Sentiment Analysis. Methods, such as ULMFiT, ELMo, BERT, and their derivatives, have achieved increasing success with these tasks, but often requiring substantial amounts of pre-training data and computational resources. We propose a novel methodology to classify the sentiment of tweets, based on BERT but focusing on emojis, treating them as an important source of sentiment as opposed to considering them simple input tokens. Additionally, it is possible to use a previously pre-trained BERT model to warm start ours, greatly reducing the training time required. Experiments on two Brazilian Portuguese datasets - TweetSentBR and 2000-tweets-BR - show that our methodology produces better results than BERT and outperforms the previously published results for both datasets, thus establishing new state-of-the-art results on TweetSentBR with accuracy of 0.7577 (4.8 percentage points absolute improvement) and F1 score of 0.7395 (8.4 percentage points absolute improvement); and on 2000-tweets-BR with accuracy of 0.8316 (15.2 percentage points absolute improvement) and F1 score of 0.8151 (24.5 percentage points absolute improvement).

References

[1]
Henrico Brum and Maria das Graças Volpe Nunes. 2018. Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In 11th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Miyazaki, Japan, 4167--4172.
[2]
Henrico Bertini Brum and Maria das Graças Volpe Nunes. 2018. Semi-supervised Sentiment Annotation of Large Corpora. In 13th International Conference on Computational Processing of the Portuguese Language (PROPOR). Organization Committee of the International Conference on Computational Processing of the Portuguese Language (OC-PROPOR), Springer, Canela, Brazil, 385--395.
[3]
Andrew M Dai and Quoc V Le. 2015. Semi-Supervised Sequence Learning. In 29th Conference on Neural Information Processing Systems (NIPS). Neural Information Processing Systems (NIPS) Foundation, Curran Associates, Inc., Montréal, Canada, 3079--3087.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In 20th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT): Long and Short Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, Minneapolis, USA, 4171--4186.
[5]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter Sentiment Classification using Distant Supervision. CS22JfN project report, Stanford 1, 12 (2009), 1--6.
[6]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780.
[7]
Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-Tuning for Text Classification. In 56th Annual Meeting of the Association for Computational Linguistics (ACL): Long Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, Melbourne, Australia, 328--339.
[8]
Nozomi Kobayashi, Kentaro Inui, and Yuji Matsumoto. 2007. Opinion Mining from Web Documents: Extraction and Structurization. Information and Media Technologies 2, 1 (2007), 326--337.
[9]
Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1--167.
[10]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations (ICLR). International Conference on Learning Representations (ICLR), ICLR, New Orleans, USA, 1--10.
[11]
Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. 2016. "Blissfully Happy" or "Ready to Fight": Varying Interpretations of Emoji. In 10th International Conference on Web and Social Media (ICWSM). Association for the Advancement of Artificial Intelligence (AAAI), AAAI Press, Cologne, Germany, 259--268.
[12]
Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima. 2002. Mining Product Reputations on the Web. In 8th International Conference on Knowledge Discovery and Data Mining (KDD). Association for Computing Machinery (ACM), ACM, Edmonton, Canada, 341--349.
[13]
Makoto Nakatsuji and Yasuhiro Fujiwara. 2014. Linked Taxonomies to Capture Users' Subjective Assessments of Items to Facilitate Accurate Collaborative Filtering. Artificial Intelligence 207 (2014), 52--68.
[14]
Paulo de Assis Nascimento. 2019. Aplicando Ensemble para Classificação de Textos Curtos em Português do Brasil. Master's thesis. Universidade Federal de Pernambuco, Recife, Brazil.
[15]
Tetsuya Nasukawa and Jeonghee Yi. 2003. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In 2nd International Conference on Knowledge Capture (K-CAP). Association for Computing Machinery (ACM), ACM, Florida, USA, 70--77.
[16]
Juri Opitz and Sebastian Burst. 2019. Macro F1 and Macro F1. Computing Research Repository abs/1911.03347 (2019), 1--12.
[17]
Alexander Pak and Patrick Paroubek. 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In 7th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Valletta, Malta, 1320--1326.
[18]
Bo Pang and Lillian Lee. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In 43rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics (ACL), ACL, Ann Arbor, USA, 115--124.
[19]
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In 19th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT): Long Papers - Volume 1. Association for Computational Linguistics (ACL), ACL, New Orleans, USA, 2227--2237.
[20]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical Report. OpenAI.
[21]
Kenzo Miranda Sakiyama, Andre Quintiliano Bezerra Silva, and Edson Takashi Matsubara. 2019. Twitter Breaking News Detector in the 2018 Brazilian Presidential Election using Word Embeddings and Convolutional Neural Networks. In 37th International Joint Conference on Neural Networks (IJCNN). Institute of Electrical and Electronics Engineers (IEEE), IEEE, Budapest, Hungary, 1--8.
[22]
Fabio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2019. Portuguese Named Entity Recognition using BERT-CRF. Computing Research Repository abs/1909.10649 (2019), 1--8.
[23]
Wilson L Taylor. 1953. "Cloze Procedure": A New Tool for Measuring Readability. Journalism Quarterly 30, 4 (1953), 415--433.
[24]
Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on Mining Subjective Data on the Web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.
[25]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In 31st Conference on Neural Information Processing Systems (NIPS). Neural Information Processing Systems (NIPS) Foundation, Curran Associates, Inc., Long Beach, USA, 5998--6008.
[26]
Douglas Vitório, Ellen Souza, Ingryd Teles, and Adriano LI Oliveira. 2017. Investigating Opinion Mining through Language Varieties: a Case Study of Brazilian and European Portuguese tweets. In 11th Brazilian Symposium in Information and Human Language Technology (STIL). Sociedade Brasileira de Computação (SBC), SBC, Uberlândia, Brazil, 43--52.
[27]
Jorge A Wagner Filho, Rodrigo Wilkens, Marco Idiart, and Aline Villavicencio. 2018. The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In 11th International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), ELRA, Miyazaki, Japan, 4339--4344.
[28]
Hao Wang, Doğan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan. 2012. A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. In 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics (ACL), ACL, Jeju, Republic of Korea, 115--120.
[29]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Computing Research Repository abs/1609.08144 (2016), 1--23.
[30]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In 33rd Conference on Neural Information Processing Systems (NeurIPS). Neural Information Processing Systems (NeurIPS) Foundation, Curran Associates, Inc., Vancouver, Canada, 5753--5763.
[31]
Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep Learning for Sentiment Analysis: A Survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.

Cited By

View all
  • (2024)Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks ApproachInformation10.3390/info1504020015:4(200)Online publication date: 4-Apr-2024
  • (2024)An emoji feature-incorporated multi-view deep learning for explainable sentiment classification of social media reviewsTechnological Forecasting and Social Change10.1016/j.techfore.2024.123326202(123326)Online publication date: May-2024
  • (2024)ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese LanguageIntelligent Systems10.1007/978-3-031-79032-4_23(324-338)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. Leveraging emoji to improve sentiment classification of tweets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
    March 2021
    2075 pages
    ISBN:9781450381048
    DOI:10.1145/3412841
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. emoji
    2. natural language processing
    3. sentiment analysis
    4. social media

    Qualifiers

    • Research-article

    Funding Sources

    • CNPq
    • FAPESP

    Conference

    SAC '21
    Sponsor:
    SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing
    March 22 - 26, 2021
    Virtual Event, Republic of Korea

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks ApproachInformation10.3390/info1504020015:4(200)Online publication date: 4-Apr-2024
    • (2024)An emoji feature-incorporated multi-view deep learning for explainable sentiment classification of social media reviewsTechnological Forecasting and Social Change10.1016/j.techfore.2024.123326202(123326)Online publication date: May-2024
    • (2024)ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese LanguageIntelligent Systems10.1007/978-3-031-79032-4_23(324-338)Online publication date: 17-Nov-2024
    • (2024)A Deep Learning Based Emoticon Classification for Social Media Comment AnalysisAdvancements in Smart Computing and Information Security10.1007/978-3-031-59097-9_23(313-328)Online publication date: 2-May-2024
    • (2023)Performance Comparison of Transformer-Based Models on Twitter Health Mention ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.314376810:3(1140-1149)Online publication date: Jun-2023
    • (2023)Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation modelsLanguage Resources and Evaluation10.1007/s10579-023-09661-458:1(223-272)Online publication date: 28-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media