research-article

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Authors:
Felipe Viegas

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil

0000-0001-8121-8607
View Profile

,
Sergio Canuto

Instituto Federal de Goiás, Brazil

Instituto Federal de Goiás, Brazil

0000-0003-2973-4158
View Profile

,
Washington Cunha

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil

0000-0002-1988-8412
View Profile

,
Celso França

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil

0000-0002-0251-7172
View Profile

,
Claudio Valiense

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil

0000-0002-7366-2633
View Profile

,
Leonardo Rocha

Universidade Federal de São João del-Rei, Brazil

Universidade Federal de São João del-Rei, Brazil

0000-0002-4913-4902
View Profile

,
Marcos André Gonçalves

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil

0000-0002-2075-3363
View Profile

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the WebOctober 2023Pages 110–118https://doi.org/10.1145/3617023.3617039

Published:23 October 2023Publication History

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

Pages 110–118

ABSTRACT

The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that better capture affective information, with a potential undesired side effect of noise introduced into the data. This work proposes a new strategy for customized dataset-oriented sentiment analysis – CluSent – that exploits a powerful, recently proposed concept for representing semantically related words – CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which takes advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, five state-of-the-art baselines (including modern transformer architectures), and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).

References

Mohamad Alissa, Issa Haddad, Jonathan Meyer, Jade Obeid, Kostis Vilaetis, Nicolas Wiecek, and Sukrit Wongariyakavee. 2021. Sentiment Analysis for Open Domain Conversational Agent. arxiv:2101.00675 [cs.AI]Google Scholar
Washington Cunha, Vítor Mangaravite, Christian Gomes, Sérgio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso França, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, and Marcos André Gonçalves. 2021. On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study. IP&M 58, 3 (2021), 102481. https://doi.org/10.1016/j.ipm.2020.102481Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018). https://arxiv.org/abs/1810.04805Google Scholar
Fábio Figueiredo, Leonardo Rocha, Thierson Couto, Thiago Salles, Marcos André Gonçalves, and Wagner Meira Jr.2011. Word Co-occurrence Features for Text Classification. Inf. Syst. 36 (2011). https://doi.org/10.1016/j.is.2011.02.002Google ScholarDigital Library
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. Processing 150 (01 2009).Google Scholar
Xia Hu, Nan Sun, Chao Zhang, and Tat-Seng Chua. 2009. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of CIKM. ACM, 919–928. https://doi.org/10.1145/1645953.1646071Google ScholarDigital Library
Qi Huang, Zhanghao Chen, Zijie Lu, and Yuan Ye. 2018. Analysis of Bag-of-n-grams Representation’s Properties Based on Textual Reconstruction. CoRR (2018). arxiv:1809.06502http://arxiv.org/abs/1809.06502Google Scholar
Clayton J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In ICWSM’14.Google Scholar
Zhigang Jin, Xiaofang Zhao, and Yuhong Liu. 2021. Heterogeneous Graph Network Embedding for Sentiment Analysis on Social Media. Cognitive Computation 13, 1 (01 Jan 2021), 81–95. https://doi.org/10.1007/s12559-020-09793-7Google ScholarCross Ref
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR. 5 (2004), 361–397.Google ScholarDigital Library
Alhassan Mabrouk, Rebeca P. Díaz Redondo, and Mohammed Kayed. 2020. Deep Learning-Based Sentiment Classification: A Comparative Survey. IEEE Access 8 (2020), 85616–85638. https://doi.org/10.1109/ACCESS.2020.2992013Google ScholarCross Ref
Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.Google Scholar
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In LREC’18.Google Scholar
Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of Domain-specific Word Embeddings using Knowledge Resources. In LREC’18, Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). ELRA, Miyazaki, Japan.Google Scholar
Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench: A benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29.Google ScholarCross Ref
Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 Task 4: Sentiment Analysis in Twitter. CoRR abs/1912.00741 (2019). arxiv:1912.00741http://arxiv.org/abs/1912.00741Google Scholar
Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 6940–6948. https://doi.org/10.1609/aaai.v33i01.33016940Google ScholarDigital Library
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP’19. ACL, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170Google Scholar
Tan Thongtan and Tanasanee Phienthrakul. 2019. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Florence, Italy, 407–414. https://doi.org/10.18653/v1/P19-2057Google ScholarCross Ref
Felipe Viegas, Mário S. Alvim, Sérgio Canuto, Thierson Rosa, Marcos André Gonçalves, and Leonardo Rocha. 2020. Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. Information Systems 94 (2020), 101606. https://doi.org/10.1016/j.is.2020.101606Google ScholarCross Ref
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of WSDM ’19 (Melbourne VIC, Australia). 753–761. https://doi.org/10.1145/3289600.3291032Google ScholarDigital Library
Felipe Viegas, Washington Cunha, Christian Gomes, Antônio Pereira, Leonardo Rocha, and Marcos Goncalves. 2020. CluHTM - Semantic Hierarchical Topic Modeling based on CluWords. In Proc. of the 58th Annual Meeting of the Assoc. for Computational Linguistics (ACL 2020). Association for Computational Linguistics, 8138–8150.Google ScholarCross Ref
Yanyan Wang, Fulian Yin, Jianbo Liu, and Marco Tosato. 2020. Automatic construction of domain sentiment lexicon for semantic disambiguation. Multim. Tools Appl. 79, 31-32 (2020), 22355–22373. https://doi.org/10.1007/s11042-020-09030-1Google ScholarCross Ref
Da Yin, Tao Meng, and Kai-Wei Chang. 2020. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics. In Proceedings of the 58th Conference of the Association for Computational Linguistics, ACL 2020, Seattle, USA.Google ScholarCross Ref

Index Terms

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Extracting domain-specific opinion words for sentiment analysis
MICAI'12: Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II

In this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, ...
Read More
Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

In this paper, we propose a representation learning research framework for document-level sentiment analysis. Given a document as the input, document-level sentiment analysis aims to automatically classify its sentiment/opinion (such as thumbs up or ...
Read More
Combining lexicon and learning based approaches for concept-level sentiment analysis
WISDOM '12: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining

In this paper, we present the anatomy of pSenti --- a concept-level sentiment analysis system that seamlessly integrates into opinion mining lexicon-based and learning-based approaches. Compared with pure lexicon-based systems, it achieves significantly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web
October 2023
285 pages
ISBN:9798400709081
DOI:10.1145/3617023

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Classification
Natural Language Processing
Sentiment Analysis
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate270of873submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 23
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extracting domain-specific opinion words for sentiment analysis

Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis

Combining lexicon and learning based approaches for concept-level sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extracting domain-specific opinion words for sentiment analysis

Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis

Combining lexicon and learning based approaches for concept-level sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media