skip to main content
10.1145/3617023.3617039acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Published:23 October 2023Publication History

ABSTRACT

The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that better capture affective information, with a potential undesired side effect of noise introduced into the data. This work proposes a new strategy for customized dataset-oriented sentiment analysis – CluSent – that exploits a powerful, recently proposed concept for representing semantically related words – CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which takes advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, five state-of-the-art baselines (including modern transformer architectures), and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).

References

  1. Mohamad Alissa, Issa Haddad, Jonathan Meyer, Jade Obeid, Kostis Vilaetis, Nicolas Wiecek, and Sukrit Wongariyakavee. 2021. Sentiment Analysis for Open Domain Conversational Agent. arxiv:2101.00675 [cs.AI]Google ScholarGoogle Scholar
  2. Washington Cunha, Vítor Mangaravite, Christian Gomes, Sérgio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso França, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, and Marcos André Gonçalves. 2021. On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study. IP&M 58, 3 (2021), 102481. https://doi.org/10.1016/j.ipm.2020.102481Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018). https://arxiv.org/abs/1810.04805Google ScholarGoogle Scholar
  4. Fábio Figueiredo, Leonardo Rocha, Thierson Couto, Thiago Salles, Marcos André Gonçalves, and Wagner Meira Jr.2011. Word Co-occurrence Features for Text Classification. Inf. Syst. 36 (2011). https://doi.org/10.1016/j.is.2011.02.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. Processing 150 (01 2009).Google ScholarGoogle Scholar
  6. Xia Hu, Nan Sun, Chao Zhang, and Tat-Seng Chua. 2009. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of CIKM. ACM, 919–928. https://doi.org/10.1145/1645953.1646071Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Qi Huang, Zhanghao Chen, Zijie Lu, and Yuan Ye. 2018. Analysis of Bag-of-n-grams Representation’s Properties Based on Textual Reconstruction. CoRR (2018). arxiv:1809.06502http://arxiv.org/abs/1809.06502Google ScholarGoogle Scholar
  8. Clayton J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In ICWSM’14.Google ScholarGoogle Scholar
  9. Zhigang Jin, Xiaofang Zhao, and Yuhong Liu. 2021. Heterogeneous Graph Network Embedding for Sentiment Analysis on Social Media. Cognitive Computation 13, 1 (01 Jan 2021), 81–95. https://doi.org/10.1007/s12559-020-09793-7Google ScholarGoogle ScholarCross RefCross Ref
  10. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR. 5 (2004), 361–397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alhassan Mabrouk, Rebeca P. Díaz Redondo, and Mohammed Kayed. 2020. Deep Learning-Based Sentiment Classification: A Comparative Survey. IEEE Access 8 (2020), 85616–85638. https://doi.org/10.1109/ACCESS.2020.2992013Google ScholarGoogle ScholarCross RefCross Ref
  12. Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.Google ScholarGoogle Scholar
  13. Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In LREC’18.Google ScholarGoogle Scholar
  14. Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of Domain-specific Word Embeddings using Knowledge Resources. In LREC’18, Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). ELRA, Miyazaki, Japan.Google ScholarGoogle Scholar
  15. Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench: A benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 Task 4: Sentiment Analysis in Twitter. CoRR abs/1912.00741 (2019). arxiv:1912.00741http://arxiv.org/abs/1912.00741Google ScholarGoogle Scholar
  17. Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 6940–6948. https://doi.org/10.1609/aaai.v33i01.33016940Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP’19. ACL, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170Google ScholarGoogle Scholar
  19. Tan Thongtan and Tanasanee Phienthrakul. 2019. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Florence, Italy, 407–414. https://doi.org/10.18653/v1/P19-2057Google ScholarGoogle ScholarCross RefCross Ref
  20. Felipe Viegas, Mário S. Alvim, Sérgio Canuto, Thierson Rosa, Marcos André Gonçalves, and Leonardo Rocha. 2020. Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. Information Systems 94 (2020), 101606. https://doi.org/10.1016/j.is.2020.101606Google ScholarGoogle ScholarCross RefCross Ref
  21. Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of WSDM ’19 (Melbourne VIC, Australia). 753–761. https://doi.org/10.1145/3289600.3291032Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Felipe Viegas, Washington Cunha, Christian Gomes, Antônio Pereira, Leonardo Rocha, and Marcos Goncalves. 2020. CluHTM - Semantic Hierarchical Topic Modeling based on CluWords. In Proc. of the 58th Annual Meeting of the Assoc. for Computational Linguistics (ACL 2020). Association for Computational Linguistics, 8138–8150.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yanyan Wang, Fulian Yin, Jianbo Liu, and Marco Tosato. 2020. Automatic construction of domain sentiment lexicon for semantic disambiguation. Multim. Tools Appl. 79, 31-32 (2020), 22355–22373. https://doi.org/10.1007/s11042-020-09030-1Google ScholarGoogle ScholarCross RefCross Ref
  24. Da Yin, Tao Meng, and Kai-Wei Chang. 2020. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics. In Proceedings of the 58th Conference of the Association for Computational Linguistics, ACL 2020, Seattle, USA.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web
      October 2023
      285 pages
      ISBN:9798400709081
      DOI:10.1145/3617023

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate270of873submissions,31%
    • Article Metrics

      • Downloads (Last 12 months)23
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format