skip to main content
10.1145/3639233.3639353acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Classifying Sentiments on Social Media Texts: A GPT-4 Preliminary Study

Published:05 March 2024Publication History

ABSTRACT

In today's digital age, social media has become a hub for people to express their thoughts and feelings. Sentiment classification discerns public opinions and trends to understand their sentiments towards a certain topic. Often, achieving accurate sentiment classifications in large datasets necessitate the use of human-annotated training data which can be costly and time-consuming. Large Language Models (LLMs) like the Generative Pre-trained models by OpenAI have surged in popularity due to its capabilities in understanding the given tasks. In this preliminary study, we report the performance of the latest OpenAI GPT-4 using zero- and one-shot learning approaches on classifying sentiments when fed with social media dataset. Notably, the latter approach written in English which mimics the instructions designed for human annotators, achieved a substantial agreement (k = 0.77) with human annotations, displaying high accuracy, precision, and recall accordingly even without explicit training data. Meanwhile, the fine-tuned mBERT resulted to lower evaluation scores than the GPT-4. Our findings provide foundational insights into the strengths and limitations of GPT-4 for sentiment classification in a social media dataset, setting the groundwork for broad future research in this field.

References

  1. Marvin M. Agüero-Torales, José I. Abreu Salas, and Antonio G. López-Herrera. 2021. Deep learning and multilingual sentiment analysis on social media data: An overview. Appl Soft Comput 107, (August 2021). https://doi.org/10.1016/j.asoc.2021.107373Google ScholarGoogle ScholarCross RefCross Ref
  2. Maria Charmy A Arispe, Joni Neil B Capucao, Floradel S Relucio, and Daniel E., Jr. Maligat. 2019. Teachers’ sentiments to Bikol MTB-MLE: Using sentiment analysis and text mining techniques. International Journal of Research Studies in Education 8, 4 (July 2019). https://doi.org/10.5861/ijrse.2019.4906Google ScholarGoogle ScholarCross RefCross Ref
  3. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. (May 2020). Retrieved from http://arxiv.org/abs/2005.14165Google ScholarGoogle Scholar
  4. Mary Joy Canon, Christian Sy, and Lea Austero. 2019. Discovering themes from online news articles on the 2018 mt. mayon eruption. In Proceedings - 2018 International Symposium on Computer, Consumer and Control, IS3C 2018, February 19, 2019. Institute of Electrical and Electronics Engineers Inc., 242–245. . https://doi.org/10.1109/IS3C.2018.00068Google ScholarGoogle ScholarCross RefCross Ref
  5. Lingjiao Chen, Matei Zaharia, and James Zou. How Is ChatGPT's Behavior Changing over Time?Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (October 2018). Retrieved from https://arxiv.org/pdf/1810.04805v2Google ScholarGoogle Scholar
  7. Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 120, 30 (July 2023). https://doi.org/10.1073/pnas.2305016120Google ScholarGoogle ScholarCross RefCross Ref
  8. Pritam Gundecha and Huan Liu. 2012. Mining Social Media: A Brief Introduction. In 2012 TutORials in Operations Research. INFORMS, 1–17. https://doi.org/10.1287/educ.1120.0105Google ScholarGoogle ScholarCross RefCross Ref
  9. Fan Huang, Haewoon Kwak, and Jisun An. 2023. Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech. In ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023, April 30, 2023. Association for Computing Machinery, Inc, 294–297. . https://doi.org/10.1145/3543873.3587368Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Adaikkan Kalaivani and Durairaj Thenmozhi. 2021. Multilingual Sentiment Analysis in Tamil, Malayalam, and Kannada code-mixed social media posts using MBERT. Retrieved from https://ceur-ws.org/Vol-3159/T6-16.pdfGoogle ScholarGoogle Scholar
  11. Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi, Yutaro Yamada, and Dragomir Radev. 2023. Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations. (March 2023). Retrieved from http://arxiv.org/abs/2303.18027Google ScholarGoogle Scholar
  12. Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo David Arredondo. GPT-4 Passes the Bar Exam. Retrieved from http://dx.doi.org/10.2139/ssrn.4389233Google ScholarGoogle ScholarCross RefCross Ref
  13. Kiana Kheiri and Hamid Karimi. 2023. SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning. (July 2023). Retrieved from http://arxiv.org/abs/2307.10234Google ScholarGoogle Scholar
  14. Monica Lee and John Levi Martin. 2015. Coding, counting and cultural cartography. Am J Cult Sociol 3, 1 (January 2015), 1–33. https://doi.org/10.1057/ajcs.2014.13Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhengliang Liu, Xiaowei Yu, Lu Zhang, Zihao Wu, Chao Cao, Haixing Dai, Lin Zhao, Wei Liu, Dinggang Shen, Quanzheng Li, Tianming Liu, Dajiang Zhu, and Xiang Li. 2023. DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4. (March 2023). Retrieved from http://arxiv.org/abs/2303.11032Google ScholarGoogle Scholar
  16. Chandreen Liyanage, Ravi Gokani, and Vijay Mago. GPT-4 as a Twitter Data Annotator: Unraveling Its Performance on a Stance Classification Task. https://doi.org/10.36227/techrxiv.24143706.v1Google ScholarGoogle ScholarCross RefCross Ref
  17. Ismini Lourentzou, Kabir Manghnani, and Chengxiang Zhai. Adapting Sequence to Sequence Models for Text Normalization in Social Media. Retrieved from https://arxiv.org/abs/1904.06100Google ScholarGoogle Scholar
  18. Lany L Maceda, Arlene A Satuito, and Mideth B Abisado. Sentiment Analysis of Code-mixed Social Media Data on Philippine UAQTE using Fine-tuned mBERT Model. IJACSA) International Journal of Advanced Computer Science and Applications 14, 7 , 2023Google ScholarGoogle Scholar
  19. Lany MacEda, Jennifer Llovido, and Arlene Satuito. 2019. Categorization of earthquake-related tweets using machine learning approaches. In Proceedings - 2018 International Symposium on Computer, Consumer and Control, IS3C 2018, February 19, 2019. Institute of Electrical and Electronics Engineers Inc., 229–232. . https://doi.org/10.1109/IS3C.2018.00065Google ScholarGoogle ScholarCross RefCross Ref
  20. Laura K. Nelson, Derek Burk, Marcel Knudsen, and Leslie McCall. 2021. The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods. Sociol Methods Res 50, 1 (February 2021), 202–237. https://doi.org/10.1177/0049124118769114Google ScholarGoogle ScholarCross RefCross Ref
  21. Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, and Eric Horvitz. 2023. Capabilities of GPT-4 on Medical Challenge Problems. (March 2023). Retrieved from http://arxiv.org/abs/2303.13375Google ScholarGoogle Scholar
  22. OpenAI. 2023. GPT-4 Technical Report. (March 2023). Retrieved from http://arxiv.org/abs/2303.08774Google ScholarGoogle Scholar
  23. Alec Radford Openai, Karthik Narasimhan Openai, Tim Salimans Openai, and Ilya Sutskever Openai. Improving Language Understanding by Generative Pre-Training. Retrieved from https://api.semanticscholar.org/CorpusID:49313245Google ScholarGoogle Scholar
  24. Chandrashekhar S. Pawar and Ashwin Makwana. 2022. Comparison of BERT-Base and GPT-3 for Marathi Text Classification. Lecture Notes in Electrical Engineering 936, (2022), 563–574. https://doi.org/10.1007/978-981-19-5037-7_40/COVERGoogle ScholarGoogle ScholarCross RefCross Ref
  25. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners. Retrieved from https://api.semanticscholar.org/CorpusID:160025533Google ScholarGoogle Scholar
  26. John Patrick Ranara. 2023. From “Arjo Cutie” to “I will marry you, cutie”: A timeline of Maine Mendoza and Arjo Atayde's romance. Philstar Life.Google ScholarGoogle Scholar
  27. Jaromir Savelka, Arav Agarwal, Marshall An, Chris Bogart, and Majd Sakr. 2023. Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. August 07, 2023. Association for Computing Machinery (ACM), 78–92. .https://doi.org/10.1145/3568813.3600142Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thomas Schmidt, Manuel Burghardt, Katrin Dennerlein, and Christian Wolff. Sentiment Annotation for Lessing's Plays: Towards a Language Resource for Sentiment Analysis on German Literary Texts. Conference on Language, Data and Knowledge (LDK 2019), 2019, pp. 45–50. [Online]. Available: http://ceur-ws.org/Vol-2402/paper9.pdfGoogle ScholarGoogle Scholar
  29. Kogilavani Shanmugavadivel, Sai Haritha Sampath, Pramod Nandhakumar, Prasath Mahalingam, Malliga Subramanian, Prasanna Kumar Kumaresan, and Ruba Priyadharshini. 2022. An analysis of machine learning models for sentiment analysis of Tamil code-mixed data. Comput Speech Lang 76, (November 2022), 101407. https://doi.org/10.1016/J.CSL.2022.101407Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Olga Uryupina, Barbara Plank, Aliaksei Severyn, Agata Rotondi, and Alessandro Moschitti. SenTube: A Corpus for Sentiment Analysis on YouTube Social Media. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/180_Paper.pdfGoogle ScholarGoogle Scholar
  31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. (June 2017). Retrieved from http://arxiv.org/abs/1706.03762Google ScholarGoogle Scholar
  32. Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. Want To Reduce Labeling Cost? GPT-3 Can Help. (August 2021). Retrieved from http://arxiv.org/abs/2108.13487Google ScholarGoogle Scholar
  33. Zengzhi Wang, Qiming Xie, Zixiang Ding, Yi Feng, and Rui Xia. 2023. Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study. (April 2023). Retrieved from http://arxiv.org/abs/2304.04339Google ScholarGoogle Scholar
  34. Mayur Wankhade, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. 2022. A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55, 7 (October 2022), 5731–5780. https://doi.org/10.1007/s10462-022-10144-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre Yves Oudeyer. 2023. Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding. In International Conference on Intelligent User Interfaces, Proceedings IUI, March 27, 2023. Association for Computing Machinery, 75–78. https://doi.org/10.1145/3581754.3584136Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ali Ayele, and Chris Biemann. Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models. In Proceedings of the 28th International Conference on Computational Linguistics, Jan. 2020, doi: 10.18653/v1/2020.coling-main.91.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yiming Zhu, Peixian Zhang, Ehsan-Ul Haq, Pan Hui, and Gareth Tyson. 2023. Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks. (April 2023). Retrieved from http://arxiv.org/abs/2304.10145Google ScholarGoogle Scholar

Index Terms

  1. Classifying Sentiments on Social Media Texts: A GPT-4 Preliminary Study

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '23: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval
      December 2023
      336 pages
      ISBN:9798400709227
      DOI:10.1145/3639233

      Copyright © 2023 ACM

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 March 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)31
      • Downloads (Last 6 weeks)21

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format