ABSTRACT
The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that better capture affective information, with a potential undesired side effect of noise introduced into the data. This work proposes a new strategy for customized dataset-oriented sentiment analysis – CluSent – that exploits a powerful, recently proposed concept for representing semantically related words – CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which takes advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, five state-of-the-art baselines (including modern transformer architectures), and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).
- Mohamad Alissa, Issa Haddad, Jonathan Meyer, Jade Obeid, Kostis Vilaetis, Nicolas Wiecek, and Sukrit Wongariyakavee. 2021. Sentiment Analysis for Open Domain Conversational Agent. arxiv:2101.00675 [cs.AI]Google Scholar
- Washington Cunha, Vítor Mangaravite, Christian Gomes, Sérgio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso França, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, and Marcos André Gonçalves. 2021. On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study. IP&M 58, 3 (2021), 102481. https://doi.org/10.1016/j.ipm.2020.102481Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018). https://arxiv.org/abs/1810.04805Google Scholar
- Fábio Figueiredo, Leonardo Rocha, Thierson Couto, Thiago Salles, Marcos André Gonçalves, and Wagner Meira Jr.2011. Word Co-occurrence Features for Text Classification. Inf. Syst. 36 (2011). https://doi.org/10.1016/j.is.2011.02.002Google ScholarDigital Library
- Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. Processing 150 (01 2009).Google Scholar
- Xia Hu, Nan Sun, Chao Zhang, and Tat-Seng Chua. 2009. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of CIKM. ACM, 919–928. https://doi.org/10.1145/1645953.1646071Google ScholarDigital Library
- Qi Huang, Zhanghao Chen, Zijie Lu, and Yuan Ye. 2018. Analysis of Bag-of-n-grams Representation’s Properties Based on Textual Reconstruction. CoRR (2018). arxiv:1809.06502http://arxiv.org/abs/1809.06502Google Scholar
- Clayton J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In ICWSM’14.Google Scholar
- Zhigang Jin, Xiaofang Zhao, and Yuhong Liu. 2021. Heterogeneous Graph Network Embedding for Sentiment Analysis on Social Media. Cognitive Computation 13, 1 (01 Jan 2021), 81–95. https://doi.org/10.1007/s12559-020-09793-7Google ScholarCross Ref
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR. 5 (2004), 361–397.Google ScholarDigital Library
- Alhassan Mabrouk, Rebeca P. Díaz Redondo, and Mohammed Kayed. 2020. Deep Learning-Based Sentiment Classification: A Comparative Survey. IEEE Access 8 (2020), 85616–85638. https://doi.org/10.1109/ACCESS.2020.2992013Google ScholarCross Ref
- Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.Google Scholar
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In LREC’18.Google Scholar
- Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of Domain-specific Word Embeddings using Knowledge Resources. In LREC’18, Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). ELRA, Miyazaki, Japan.Google Scholar
- Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench: A benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29.Google ScholarCross Ref
- Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 Task 4: Sentiment Analysis in Twitter. CoRR abs/1912.00741 (2019). arxiv:1912.00741http://arxiv.org/abs/1912.00741Google Scholar
- Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 6940–6948. https://doi.org/10.1609/aaai.v33i01.33016940Google ScholarDigital Library
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP’19. ACL, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170Google Scholar
- Tan Thongtan and Tanasanee Phienthrakul. 2019. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Florence, Italy, 407–414. https://doi.org/10.18653/v1/P19-2057Google ScholarCross Ref
- Felipe Viegas, Mário S. Alvim, Sérgio Canuto, Thierson Rosa, Marcos André Gonçalves, and Leonardo Rocha. 2020. Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. Information Systems 94 (2020), 101606. https://doi.org/10.1016/j.is.2020.101606Google ScholarCross Ref
- Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of WSDM ’19 (Melbourne VIC, Australia). 753–761. https://doi.org/10.1145/3289600.3291032Google ScholarDigital Library
- Felipe Viegas, Washington Cunha, Christian Gomes, Antônio Pereira, Leonardo Rocha, and Marcos Goncalves. 2020. CluHTM - Semantic Hierarchical Topic Modeling based on CluWords. In Proc. of the 58th Annual Meeting of the Assoc. for Computational Linguistics (ACL 2020). Association for Computational Linguistics, 8138–8150.Google ScholarCross Ref
- Yanyan Wang, Fulian Yin, Jianbo Liu, and Marco Tosato. 2020. Automatic construction of domain sentiment lexicon for semantic disambiguation. Multim. Tools Appl. 79, 31-32 (2020), 22355–22373. https://doi.org/10.1007/s11042-020-09030-1Google ScholarCross Ref
- Da Yin, Tao Meng, and Kai-Wei Chang. 2020. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics. In Proceedings of the 58th Conference of the Association for Computational Linguistics, ACL 2020, Seattle, USA.Google ScholarCross Ref
Index Terms
- CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts
Recommendations
Extracting domain-specific opinion words for sentiment analysis
MICAI'12: Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part IIIn this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, ...
Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningIn this paper, we propose a representation learning research framework for document-level sentiment analysis. Given a document as the input, document-level sentiment analysis aims to automatically classify its sentiment/opinion (such as thumbs up or ...
Combining lexicon and learning based approaches for concept-level sentiment analysis
WISDOM '12: Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion MiningIn this paper, we present the anatomy of pSenti --- a concept-level sentiment analysis system that seamlessly integrates into opinion mining lexicon-based and learning-based approaches. Compared with pure lexicon-based systems, it achieves significantly ...
Comments